Through The Looking Glass: Common Sense Consistency Evaluation Of Weird Images | Awesome LLM Papers Contribute to Awesome LLM Papers

Through The Looking Glass: Common Sense Consistency Evaluation Of Weird Images

Elisei Rykov, Kseniia Petrushina, Kseniia Titova, Anton Razzhigaev, Alexander Panchenko, Vasily Konovalov . No Venue 2025

[Code] [Paper]   Search on Google Scholar   Search on Semantic Scholar
Datasets Evaluation Fine Tuning Model Architecture

Measuring how real images look is a complex task in artificial intelligence research. For example, an image of a boy with a vacuum cleaner in a desert violates common sense. We introduce a novel method, which we call Through the Looking Glass (TLG), to assess image common sense consistency using Large Vision-Language Models (LVLMs) and Transformer-based encoder. By leveraging LVLMs to extract atomic facts from these images, we obtain a mix of accurate facts. We proceed by fine-tuning a compact attention-pooling classifier over encoded atomic facts. Our TLG has achieved a new state-of-the-art performance on the WHOOPS! and WEIRD datasets while leveraging a compact fine-tuning component.

https://huggingface.co/discussions/paper/682cc6b68b0dfc6de9ed8e89

Similar Work