Related papers: See, Explain, and Intervene: A Few-Shot Multimodal Agent Framework for Hateful Meme Moderation

See, Explain, and Intervene: A Few-Shot Multimodal Agent Framework for Hateful Meme Moderation

URL: http://arxiv.org/abs/2601.04692v1
Date: Thu, 08 Jan 2026 08:02:48 GMT
Title: See, Explain, and Intervene: A Few-Shot Multimodal Agent Framework for Hateful Meme Moderation
Authors: Naquee Rizwan, Subhankar Swain, Paramananda Bhaskar, Gagan Aryan, Shehryaar Shah Khan, Animesh Mukherjee,
Abstract summary: We examine hateful memes from three complementary angles - how to detect them, how to explain their content and how to intervene them prior to being posted.<n>We propose a novel framework that leverages task-specific generative multimodal agents and the few-shot adaptability of large multimodal models to cater to different types of memes.<n>We believe this is the first work focused on generalizable hateful meme moderation under limited data conditions, and has strong potential for deployment in real-world production scenarios.
Score: 5.030563948128189
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we examine hateful memes from three complementary angles - how to detect them, how to explain their content and how to intervene them prior to being posted - by applying a range of strategies built on top of generative AI models. To the best of our knowledge, explanation and intervention have typically been studied separately from detection, which does not reflect real-world conditions. Further, since curating large annotated datasets for meme moderation is prohibitively expensive, we propose a novel framework that leverages task-specific generative multimodal agents and the few-shot adaptability of large multimodal models to cater to different types of memes. We believe this is the first work focused on generalizable hateful meme moderation under limited data conditions, and has strong potential for deployment in real-world production scenarios. Warning: Contains potentially toxic contents.

Related papers

Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline [56.790045049514326]
Two major forms of deception dominate: human-crafted misinformation and AI-generated content.<n>We propose Unified Multimodal Fake Content Detection (UMFDet), a framework designed to handle both forms of deception.<n>UMFDet achieves robust and consistent performance across both misinformation types, outperforming specialized baselines.
arXiv Detail & Related papers (2025-09-30T09:26:32Z)
MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection [3.7336554275205898]
We propose MIND, a multi-agent framework for zero-shot harmful meme detection that does not rely on annotated data.<n>MIND implements three key strategies: 1) We retrieve similar memes from an unannotated reference set to provide contextual information; 2) We propose a bi-directional insight mechanism to extract a comprehensive understanding of similar memes; and 3) We employ a multi-agent debate mechanism to ensure robust decision-making through reasoned arbitration.
arXiv Detail & Related papers (2025-07-09T14:46:32Z)
Detecting Harmful Memes with Decoupled Understanding and Guided CoT Reasoning [26.546646866501735]
We introduce U-CoT+, a novel framework for harmful meme detection.<n>We first develop a high-fidelity meme-to-text pipeline that converts visual memes into detail-preserving textual descriptions.<n>This design decouples meme interpretation from meme classification, thus avoiding immediate reasoning over complex raw visual content.
arXiv Detail & Related papers (2025-06-10T06:10:45Z)
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions [4.649093665157263]
In this paper, we introduce IntMeme, a novel framework that leverages Large Multimodal Models (LMMs) for hateful meme classification with explainable decisions.<n>IntMeme addresses the dual challenges of improving both accuracy and explainability in meme moderation.<n>Our approach addresses the opacity and misclassification issues associated with PT-VLMs, optimizing the use of LMMs for hateful meme detection.
arXiv Detail & Related papers (2025-02-16T10:45:40Z)
Towards Low-Resource Harmful Meme Detection with LMM Agents [13.688955830843973]
We propose an agency-driven framework for low-resource harmful meme detection. We first retrieve relative memes with annotations to leverage label information as auxiliary signals for the LMM agent. We elicit knowledge-revising behavior within the LMM agent to derive well-generalized insights into meme harmfulness.
arXiv Detail & Related papers (2024-11-08T07:43:15Z)
XMeCap: Meme Caption Generation with Sub-Image Adaptability [53.2509590113364]
Humor, deeply rooted in societal meanings and cultural details, poses a unique challenge for machines.<n>We introduce the textscXMeCap framework, which adopts supervised fine-tuning and reinforcement learning.<n>textscXMeCap achieves an average evaluation score of 75.85 for single-image memes and 66.32 for multi-image memes, outperforming the best baseline by 6.75% and 8.56%, respectively.
arXiv Detail & Related papers (2024-07-24T10:51:46Z)
MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing [53.30190591805432]
We introduce MemeMQA, a multimodal question-answering framework to solicit accurate responses to structured questions. We also propose ARSENAL, a novel two-stage multimodal framework to address MemeMQA.
arXiv Detail & Related papers (2024-05-18T07:44:41Z)
Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations [48.82168723932981]
We introduce em MultiBully-Ex, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes. A Contrastive Language-Image Pretraining (CLIP) approach has been proposed for visual and textual explanation of a meme.
arXiv Detail & Related papers (2024-01-18T11:24:30Z)
Multimodal and Explainable Internet Meme Classification [3.4690152926833315]
We design and implement a modular and explainable architecture for Internet meme understanding. We study the relevance of our modular and explainable models in detecting harmful memes on two existing tasks: Hate Speech Detection and Misogyny Classification. We devise a user-friendly interface that facilitates the comparative analysis of examples retrieved by all of our models for any given meme.
arXiv Detail & Related papers (2022-12-11T21:52:21Z)
Detecting and Understanding Harmful Memes: A Survey [48.135415967633676]
We offer a comprehensive survey with a focus on harmful memes. One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism. Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual.
arXiv Detail & Related papers (2022-05-09T13:43:27Z)
Caption Enriched Samples for Improving Hateful Memes Detection [78.5136090997431]
The hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not. Both unimodal language models and multimodal vision-language models cannot reach the human level of performance.
arXiv Detail & Related papers (2021-09-22T10:57:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.