Related papers: MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models

MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models

URL: http://arxiv.org/abs/2505.17433v2
Date: Wed, 04 Jun 2025 08:55:43 GMT
Title: MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models
Authors: Zhengyi Zhao, Shubo Zhang, Yuxi Zhang, Yanxi Zhao, Yifan Zhang, Zezhong Wang, Huimin Wang, Yutian Zhao, Bin Liang, Yefeng Zheng, Binyang Li, Kam-Fai Wong, Xian Wu,
Abstract summary: We introduce MemeReaCon, a novel benchmark designed to evaluate how Large Vision Language Models (LVLMs) understand memes in their original context.<n>We collected memes from five different Reddit communities, keeping each meme's image, the post text, and user comments together.<n>Our tests with leading LVLMs show a clear weakness: models either fail to interpret critical information in the contexts, or overly focus on visual details while overlooking communicative purpose.
Score: 50.2355423914562
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Memes have emerged as a popular form of multimodal online communication, where their interpretation heavily depends on the specific context in which they appear. Current approaches predominantly focus on isolated meme analysis, either for harmful content detection or standalone interpretation, overlooking a fundamental challenge: the same meme can express different intents depending on its conversational context. This oversight creates an evaluation gap: although humans intuitively recognize how context shapes meme interpretation, Large Vision Language Models (LVLMs) can hardly understand context-dependent meme intent. To address this critical limitation, we introduce MemeReaCon, a novel benchmark specifically designed to evaluate how LVLMs understand memes in their original context. We collected memes from five different Reddit communities, keeping each meme's image, the post text, and user comments together. We carefully labeled how the text and meme work together, what the poster intended, how the meme is structured, and how the community responded. Our tests with leading LVLMs show a clear weakness: models either fail to interpret critical information in the contexts, or overly focus on visual details while overlooking communicative purpose. MemeReaCon thus serves both as a diagnostic tool exposing current limitations and as a challenging benchmark to drive development toward more sophisticated LVLMs of the context-aware understanding.

Related papers

Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes [5.243460995467895]
This study introduces ClassicMemes-50-templates (CM50), a large-scale dataset consisting of over 33,000 memes, centered around 50 popular meme templates.<n>We also present an automated knowledge-grounded annotation pipeline leveraging large vision-language models to produce high-quality image captions, meme captions, and literary device labels.
arXiv Detail & Related papers (2025-01-23T17:18:30Z)
XMeCap: Meme Caption Generation with Sub-Image Adaptability [53.2509590113364]
Humor, deeply rooted in societal meanings and cultural details, poses a unique challenge for machines. We introduce the textscXMeCap framework, which adopts supervised fine-tuning and reinforcement learning. textscXMeCap achieves an average evaluation score of 75.85 for single-image memes and 66.32 for multi-image memes, outperforming the best baseline by 3.71% and 4.82%, respectively.
arXiv Detail & Related papers (2024-07-24T10:51:46Z)
MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing [53.30190591805432]
We introduce MemeMQA, a multimodal question-answering framework to solicit accurate responses to structured questions. We also propose ARSENAL, a novel two-stage multimodal framework to address MemeMQA.
arXiv Detail & Related papers (2024-05-18T07:44:41Z)
Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations [48.82168723932981]
We introduce em MultiBully-Ex, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes. A Contrastive Language-Image Pretraining (CLIP) approach has been proposed for visual and textual explanation of a meme.
arXiv Detail & Related papers (2024-01-18T11:24:30Z)
A Template Is All You Meme [76.03172165923058]
We create a knowledge base composed of more than 5,200 meme templates, information about them, and 54,000 examples of template instances.<n>To investigate the semantic signal of meme templates, we show that we can match memes in datasets to base templates contained in our knowledge base with a distance-based lookup.<n>Our examination of meme templates results in state-of-the-art performance for every dataset we consider, paving the way for analysis grounded in templateness.
arXiv Detail & Related papers (2023-11-11T19:38:14Z)
Mapping Memes to Words for Multimodal Hateful Meme Classification [26.101116761577796]
Some memes take a malicious turn, promoting hateful content and perpetuating discrimination. We propose a novel approach named ISSUES for multimodal hateful meme classification. Our method achieves state-of-the-art results on the Hateful Memes Challenge and HarMeme datasets.
arXiv Detail & Related papers (2023-10-12T14:38:52Z)
MemeCap: A Dataset for Captioning and Interpreting Memes [11.188548484391978]
We present the task of meme captioning and release a new dataset, MemeCap. Our dataset contains 6.3K memes along with the title of the post containing the meme, the meme captions, the literal image caption, and the visual metaphors.
arXiv Detail & Related papers (2023-05-23T05:41:18Z)
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset [47.65948529524281]
We collect hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset. We find that memes in the wild differ in two key aspects: 1) Captions must be extracted via OCR, and 2) Memes are more diverse than traditional memes', including screenshots of conversations or text on a plain background.
arXiv Detail & Related papers (2021-07-09T09:04:05Z)
Entropy and complexity unveil the landscape of memes evolution [105.59074436693487]
We study the evolution of 2 million visual memes from Reddit over ten years, from 2011 to 2020. We find support for the hypothesis that memes are part of an emerging form of internet metalanguage.
arXiv Detail & Related papers (2021-05-26T07:41:09Z)
Multimodal Learning for Hateful Memes Detection [6.6881085567421605]
We propose a novel method that incorporates the image captioning process into the memes detection process. Our model achieves promising results on the Hateful Memes Detection Challenge.
arXiv Detail & Related papers (2020-11-25T16:49:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.