William_berrios Towards Language Models That Can See 2023

July 3, 2023 · 1 min · 152 words · Sukai Huang | Submit a report

Table of Contents

Summary of paper
- Contribution
- LENS framework
Potential future work

[TOC]

Title: Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
Author: William Berrios et. al.
Publish Year: 28 Jun 2023
Review Date: Mon, Jul 3, 2023
url: https://arxiv.org/pdf/2306.16410.pdf

Summary of paper

Contribution

proposing LENS, a modular approach that addresses computer vision tasks by harnessing the few-shot, in-context learning abilities of language models through natural language descriptions of visual inputs
LENS enables any off-the-shelf LLM to have visual capabilities without auxiliary training or data

LENS framework

a redundant text prompt might be helpful

LENS components

LENS consists of 3 distinct vision modules and 1 reasoning module, each serving a specific purpose based on the task at hand. These components are as follows:
Prompt design

Potential future work

How to encode input image to text prompt, this paper provides a good approach

we may combine this model with the Boosting Language Models Reasoning With Chain of Knowledge Prompting