Enhance Multimodal Transformer With External Label And In-domain Pretrain: Hateful Meme Challenge Winning Solution | Awesome LLM Papers Add your paper to Awesome LLM Papers

Enhance Multimodal Transformer With External Label And In-domain Pretrain: Hateful Meme Challenge Winning Solution

Ron Zhu . Arxiv 2020 – 46 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Image Text Integration Model Architecture Visual Contextualization

Hateful meme detection is a new research area recently brought out that requires both visual, linguistic understanding of the meme and some background knowledge to performing well on the task. This technical report summarises the first place solution of the Hateful Meme Detection Challenge 2020, which extending state-of-the-art visual-linguistic transformers to tackle this problem. At the end of the report, we also point out the shortcomings and possible directions for improving the current methodology.

Similar Work