Related papers: Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes

Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes

URL: http://arxiv.org/abs/2501.13851v1
Date: Thu, 23 Jan 2025 17:18:30 GMT
Title: Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes
Authors: Shiling Deng, Serge Belongie, Peter Ebert Christensen,
Abstract summary: This study introduces ClassicMemes-50-templates (CM50), a large-scale dataset consisting of over 33,000 memes, centered around 50 popular meme templates.<n>We also present an automated knowledge-grounded annotation pipeline leveraging large vision-language models to produce high-quality image captions, meme captions, and literary device labels.
Score: 5.243460995467895
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Memes have emerged as a powerful form of communication, integrating visual and textual elements to convey humor, satire, and cultural messages. Existing research has focused primarily on aspects such as emotion classification, meme generation, propagation, interpretation, figurative language, and sociolinguistics, but has often overlooked deeper meme comprehension and meme-text retrieval. To address these gaps, this study introduces ClassicMemes-50-templates (CM50), a large-scale dataset consisting of over 33,000 memes, centered around 50 popular meme templates. We also present an automated knowledge-grounded annotation pipeline leveraging large vision-language models to produce high-quality image captions, meme captions, and literary device labels overcoming the labor intensive demands of manual annotation. Additionally, we propose a meme-text retrieval CLIP model (mtrCLIP) that utilizes cross-modal embedding to enhance meme analysis, significantly improving retrieval performance. Our contributions include:(1) a novel dataset for large-scale meme study, (2) a scalable meme annotation framework, and (3) a fine-tuned CLIP for meme-text retrieval, all aimed at advancing the understanding and analysis of memes at scale.

Related papers

MemeLens: Multilingual Multitask VLMs for Memes [45.8232386994625]
We propose MemeLens, a unified multilingual and explanation-enhanced Vision Language Model (VLM) for meme understanding.<n>We consolidate 38 public meme datasets, filter and map dataset-specific labels into a shared taxonomy of $20$ tasks spanning harm, targets, figurative/pragmatic intent, and affect.<n>Our findings suggest that robust meme understanding requires multimodal training, exhibits substantial variation across semantic categories, and remains sensitive to over-specialization when models are fine-tuned on individual datasets rather than trained in a unified setting.
arXiv Detail & Related papers (2026-01-18T19:01:03Z)
Beyond Meme Templates: Limitations of Visual Similarity Measures in Meme Matching [0.9217021281095907]
We introduce a broader formulation of meme matching that extends beyond template matching.<n>We show that conventional similarity measures excel at matching template-based memes but fall short when applied to non-template-based memes.<n>Our results highlight that accurately matching memes via shared visual elements, not just background templates, remains an open challenge.
arXiv Detail & Related papers (2025-08-05T15:31:00Z)
MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models [50.2355423914562]
We introduce MemeReaCon, a novel benchmark designed to evaluate how Large Vision Language Models (LVLMs) understand memes in their original context.<n>We collected memes from five different Reddit communities, keeping each meme's image, the post text, and user comments together.<n>Our tests with leading LVLMs show a clear weakness: models either fail to interpret critical information in the contexts, or overly focus on visual details while overlooking communicative purpose.
arXiv Detail & Related papers (2025-05-23T03:27:23Z)
Decoding Memes: A Comparative Study of Machine Learning Models for Template Identification [0.0]
"meme template" is a layout or format that is used to create memes. Despite extensive research on meme virality, the task of automatically identifying meme templates remains a challenge. This paper presents a comprehensive comparison and evaluation of existing meme template identification methods.
arXiv Detail & Related papers (2024-08-15T12:52:06Z)
XMeCap: Meme Caption Generation with Sub-Image Adaptability [53.2509590113364]
Humor, deeply rooted in societal meanings and cultural details, poses a unique challenge for machines. We introduce the textscXMeCap framework, which adopts supervised fine-tuning and reinforcement learning. textscXMeCap achieves an average evaluation score of 75.85 for single-image memes and 66.32 for multi-image memes, outperforming the best baseline by 3.71% and 4.82%, respectively.
arXiv Detail & Related papers (2024-07-24T10:51:46Z)
Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations [48.82168723932981]
We introduce em MultiBully-Ex, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes. A Contrastive Language-Image Pretraining (CLIP) approach has been proposed for visual and textual explanation of a meme.
arXiv Detail & Related papers (2024-01-18T11:24:30Z)
PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using Large Language Models [7.388466146105024]
We propose textPromptMTopic, a novel multimodal prompt-based model to learn topics from both text and visual modalities. Our model effectively extracts and clusters topics learned from memes, considering the semantic interaction between the text and visual modalities. Our work contributes to the understanding of the topics and themes of memes, a crucial form of communication in today's society.
arXiv Detail & Related papers (2023-12-11T03:36:50Z)
A Template Is All You Meme [83.05919383106715]
We release a knowledge base of memes and information found on www.knowyourmeme.com, composed of more than 54,000 images. We hypothesize that meme templates can be used to inject models with the context missing from previous approaches.
arXiv Detail & Related papers (2023-11-11T19:38:14Z)
MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched Contextualization [31.209594252045566]
We propose a novel task, MEMEX, given a meme and a related document, the aim is to mine the context that succinctly explains the background of the meme. To benchmark MCC, we propose MIME, a multimodal neural framework that uses common sense enriched meme representation and a layered approach to capture the cross-modal semantic dependencies between the meme and the context.
arXiv Detail & Related papers (2023-05-25T10:19:35Z)
What do you MEME? Generating Explanations for Visual Semantic Role Labelling in Memes [42.357272117919464]
We introduce a novel task - EXCLAIM, generating explanations for visual semantic role labeling in memes. To this end, we curate ExHVV, a novel dataset that offers natural language explanations of connotative roles for three types of entities. We also posit LUMEN, a novel multimodal, multi-task learning framework that endeavors to address EXCLAIM optimally.
arXiv Detail & Related papers (2022-12-01T18:21:36Z)
Caption Enriched Samples for Improving Hateful Memes Detection [78.5136090997431]
The hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not. Both unimodal language models and multimodal vision-language models cannot reach the human level of performance.
arXiv Detail & Related papers (2021-09-22T10:57:51Z)
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset [47.65948529524281]
We collect hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset. We find that memes in the wild differ in two key aspects: 1) Captions must be extracted via OCR, and 2) Memes are more diverse than traditional memes', including screenshots of conversations or text on a plain background.
arXiv Detail & Related papers (2021-07-09T09:04:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.