VARCO-VISION: Expanding Frontiers In Korean Vision-language Models | Awesome LLM Papers Add your paper to Awesome LLM Papers

VARCO-VISION: Expanding Frontiers In Korean Vision-language Models

Jeongho Ju, Daeyoung Kim, Sunyoung Park, Youngjune Kim . No Venue 2024

[Code] [Paper]   Search on Google Scholar   Search on Semantic Scholar
Applications Compositional Generalization Datasets Evaluation Interdisciplinary Approaches Multimodal Semantic Representation Question Answering Training Techniques Visual Contextualization

In this paper, we introduce an open-source Korean-English vision-language model (VLM), VARCO-VISION. We incorporate a step-by-step training strategy that allows a model learn both linguistic and visual information while preserving the backbone model’s knowledge. Our model demonstrates outstanding performance in diverse settings requiring bilingual image-text understanding and generation abilities compared to models of similar size. VARCO-VISION is also capable of grounding, referring, and OCR, expanding its usage and potential applications for real-world scenarios. In addition to the model, we release five Korean evaluation datasets, including four closed-set and one openset benchmarks. We anticipate that our milestone will broaden the opportunities for AI researchers aiming to train VLMs. VARCO-VISION is available at https://huggingface.co/NCSOFT/VARCO-VISION-14B.

Similar Work