Haotian Liu Improved Baselines With Visual Instruction Tuning 2023
[TOC] Title: Improved Baselines With Visual Instruction Tuning Author: Haotian Liu et. al. Publish Year: Oct 5 2023 Review Date: Sun, Oct 8, 2023 url: https://arxiv.org/pdf/2310.03744.pdf Summary of paper Motivation we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. Contribution with simple modifications to LLaVA, namely, using CLIP-ViT with an MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, they establish stronger baseline....