Large Multimodal Models: Notes On CVPR 2023 Tutorial | Awesome LLM Papers Add your paper to Awesome LLM Papers

Large Multimodal Models: Notes On CVPR 2023 Tutorial

Chunyuan Li . Arxiv 2023 – 40 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Compositional Generalization Fine Tuning Image Text Integration Interdisciplinary Approaches Model Architecture Multimodal Semantic Representation Visual Contextualization

This tutorial note summarizes the presentation on Large Multimodal Models: Towards Building and Surpassing Multimodal GPT-4'', a part of CVPR 2023 tutorial onRecent Advances in Vision Foundation Models’’. The tutorial consists of three parts. We first introduce the background on recent GPT-like large models for vision-and-language modeling to motivate the research in instruction-tuned large multimodal models (LMMs). As a pre-requisite, we describe the basics of instruction-tuning in large language models, which is further extended to the multimodal space. Lastly, we illustrate how to build the minimum prototype of multimodal GPT-4 like models with the open-source resource, and review the recently emerged topics.

Similar Work