Open-domain, Content-based, Multi-modal Fact-checking Of Out-of-context Images Via Online Resources | Awesome LLM Papers Add your paper to Awesome LLM Papers

Open-domain, Content-based, Multi-modal Fact-checking Of Out-of-context Images Via Online Resources

Sahar Abdelnabi, Rakibul Hasan, Mario Fritz . 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022 – 57 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
3d Representation CVPR Compositional Generalization Evaluation Image Text Integration Model Architecture Multimodal Semantic Representation Visual Contextualization

Misinformation is now a major problem due to its potential high risks to our core democratic and societal values and orders. Out-of-context misinformation is one of the easiest and effective ways used by adversaries to spread viral false stories. In this threat, a real image is re-purposed to support other narratives by misrepresenting its context and/or elements. The internet is being used as the go-to way to verify information using different sources and modalities. Our goal is an inspectable method that automates this time-consuming and reasoning-intensive process by fact-checking the image-caption pairing using Web evidence. To integrate evidence and cues from both modalities, we introduce the concept of ‘multi-modal cycle-consistency check’; starting from the image/caption, we gather textual/visual evidence, which will be compared against the other paired caption/image, respectively. Moreover, we propose a novel architecture, Consistency-Checking Network (CCN), that mimics the layered human reasoning across the same and different modalities: the caption vs. textual evidence, the image vs. visual evidence, and the image vs. caption. Our work offers the first step and benchmark for open-domain, content-based, multi-modal fact-checking, and significantly outperforms previous baselines that did not leverage external evidence.

Similar Work