Benchmarking Large Multimodal Models Against Common Corruptions | Awesome LLM Papers Add your paper to Awesome LLM Papers

Benchmarking Large Multimodal Models Against Common Corruptions

Jiawei Zhang, Tianyu Pang, Chao Du, Yi Ren, Bo Li, Min Lin . Transactions of the Association for Computational Linguistics 2024 – 153 citations

[Code] [Paper]   Search on Google Scholar   Search on Semantic Scholar
ACL Datasets Evaluation Has Code Image Text Integration Interdisciplinary Approaches Prompting TACL Visual Contextualization Visual Question Answering

This technical report aims to fill a deficiency in the assessment of large multimodal models (LMMs) by specifically examining the self-consistency of their outputs when subjected to common corruptions. We investigate the cross-modal interactions between text, image, and speech, encompassing four essential generation tasks: text-to-image, image-to-text, text-to-speech, and speech-to-text. We create a comprehensive benchmark, named MMCBench, that covers more than 100 popular LMMs (totally over 150 model checkpoints). A thorough evaluation under common corruptions is critical for practical deployment and facilitates a better understanding of the reliability of cutting-edge LMMs. The benchmarking code is available at https://github.com/sail-sg/MMCBench

Similar Work