Detecting Harmful Memes with Decoupled Understanding and Guided CoT Reasoning
- URL: http://arxiv.org/abs/2506.08477v1
- Date: Tue, 10 Jun 2025 06:10:45 GMT
- Title: Detecting Harmful Memes with Decoupled Understanding and Guided CoT Reasoning
- Authors: Fengjun Pan, Anh Tuan Luu, Xiaobao Wu,
- Abstract summary: We introduce U-CoT+, a novel framework for harmful meme detection.<n>We first develop a high-fidelity meme-to-text pipeline that converts visual memes into detail-preserving textual descriptions.<n>This design decouples meme interpretation from meme classification, thus avoiding immediate reasoning over complex raw visual content.
- Score: 26.546646866501735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting harmful memes is essential for maintaining the integrity of online environments. However, current approaches often struggle with resource efficiency, flexibility, or explainability, limiting their practical deployment in content moderation systems. To address these challenges, we introduce U-CoT+, a novel framework for harmful meme detection. Instead of relying solely on prompting or fine-tuning multimodal models, we first develop a high-fidelity meme-to-text pipeline that converts visual memes into detail-preserving textual descriptions. This design decouples meme interpretation from meme classification, thus avoiding immediate reasoning over complex raw visual content and enabling resource-efficient harmful meme detection with general large language models (LLMs). Building on these textual descriptions, we further incorporate targeted, interpretable human-crafted guidelines to guide models' reasoning under zero-shot CoT prompting. As such, this framework allows for easy adaptation to different harmfulness detection criteria across platforms, regions, and over time, offering high flexibility and explainability. Extensive experiments on seven benchmark datasets validate the effectiveness of our framework, highlighting its potential for explainable and low-resource harmful meme detection using small-scale LLMs. Codes and data are available at: https://anonymous.4open.science/r/HMC-AF2B/README.md.
Related papers
- MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection [3.7336554275205898]
We propose MIND, a multi-agent framework for zero-shot harmful meme detection that does not rely on annotated data.<n>MIND implements three key strategies: 1) We retrieve similar memes from an unannotated reference set to provide contextual information; 2) We propose a bi-directional insight mechanism to extract a comprehensive understanding of similar memes; and 3) We employ a multi-agent debate mechanism to ensure robust decision-making through reasoned arbitration.
arXiv Detail & Related papers (2025-07-09T14:46:32Z) - MemeMind: A Large-Scale Multimodal Dataset with Chain-of-Thought Reasoning for Harmful Meme Detection [4.09109557328609]
Harmful memes pose significant challenges for automated detection due to implicit semantics and complex multimodal interactions.<n>MemeMind is a novel dataset featuring scientifically rigorous standards, large scale, diversity, bilingual support (Chinese and English), and detailed Chain-of-Thought (CoT) annotations.<n>We propose an innovative detection framework, MemeGuard, which effectively integrates multimodal information with reasoning process modeling.
arXiv Detail & Related papers (2025-06-15T13:45:30Z) - MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models [50.2355423914562]
We introduce MemeReaCon, a novel benchmark designed to evaluate how Large Vision Language Models (LVLMs) understand memes in their original context.<n>We collected memes from five different Reddit communities, keeping each meme's image, the post text, and user comments together.<n>Our tests with leading LVLMs show a clear weakness: models either fail to interpret critical information in the contexts, or overly focus on visual details while overlooking communicative purpose.
arXiv Detail & Related papers (2025-05-23T03:27:23Z) - CAMU: Context Augmentation for Meme Understanding [9.49890289676001]
Social media memes are a challenging domain for hate detection because they intertwine visual and textual cues into culturally nuanced messages.<n>We introduce a novel framework, CAMU, which leverages large vision-language models to generate more descriptive captions.<n>Our approach attains high accuracy (0.807) and F1-score (0.806) on the Hateful Memes dataset, at par with the existing SoTA framework.
arXiv Detail & Related papers (2025-04-24T19:27:55Z) - Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions [4.649093665157263]
In this paper, we introduce IntMeme, a novel framework that leverages Large Multimodal Models (LMMs) for hateful meme classification with explainable decisions.<n>IntMeme addresses the dual challenges of improving both accuracy and explainability in meme moderation.<n>Our approach addresses the opacity and misclassification issues associated with PT-VLMs, optimizing the use of LMMs for hateful meme detection.
arXiv Detail & Related papers (2025-02-16T10:45:40Z) - Towards Low-Resource Harmful Meme Detection with LMM Agents [13.688955830843973]
We propose an agency-driven framework for low-resource harmful meme detection.
We first retrieve relative memes with annotations to leverage label information as auxiliary signals for the LMM agent.
We elicit knowledge-revising behavior within the LMM agent to derive well-generalized insights into meme harmfulness.
arXiv Detail & Related papers (2024-11-08T07:43:15Z) - HateSieve: A Contrastive Learning Framework for Detecting and Segmenting Hateful Content in Multimodal Memes [8.97062933976566]
textscHateSieve is a framework designed to enhance the detection and segmentation of hateful elements in memes.<n>textscHateSieve features a novel Contrastive Meme Generator that creates semantically paired memes.<n> Empirical experiments on the Hateful Meme show that textscHateSieve not only surpasses existing LMMs in performance with fewer trainable parameters but also offers a robust mechanism for precisely identifying and isolating hateful content.
arXiv Detail & Related papers (2024-08-11T14:56:06Z) - Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning [53.766434746801366]
Multimodal contrastive learning (MCL) has shown remarkable advances in zero-shot classification by learning from millions of image-caption pairs crawled from the Internet.
Hackers may unauthorizedly exploit image-text data for model training, potentially including personal and privacy-sensitive information.
Recent works propose generating unlearnable examples by adding imperceptible perturbations to training images to build shortcuts for protection.
We propose Multi-step Error Minimization (MEM), a novel optimization process for generating multimodal unlearnable examples.
arXiv Detail & Related papers (2024-07-23T09:00:52Z) - MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention [43.849634264271565]
We present textitMemeGuard, a comprehensive framework leveraging Large Language Models (LLMs) and Visual Language Models (VLMs) for meme intervention.
textitMemeGuard harnesses a specially fine-tuned VLM, textitVLMeme, for meme interpretation, and a multimodal knowledge selection and ranking mechanism.
We leverage textitICMM to test textitMemeGuard, demonstrating its proficiency in generating relevant and effective responses to toxic memes.
arXiv Detail & Related papers (2024-06-08T04:09:20Z) - CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models [51.70129969269271]
We introduce a novel contrastive-based decoding method, COuntering DEscription Contrastive Decoding (CODE)
Our method significantly reduces hallucinations and improves cross-modal consistency across various benchmarks and cutting-edge LMMs.
arXiv Detail & Related papers (2024-06-04T03:04:21Z) - Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes
Through Multimodal Explanations [48.82168723932981]
We introduce em MultiBully-Ex, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes.
A Contrastive Language-Image Pretraining (CLIP) approach has been proposed for visual and textual explanation of a meme.
arXiv Detail & Related papers (2024-01-18T11:24:30Z) - Flexible and Robust Counterfactual Explanations with Minimal Satisfiable
Perturbations [56.941276017696076]
We propose a conceptually simple yet effective solution named Counterfactual Explanations with Minimal Satisfiable Perturbations (CEMSP)
CEMSP constrains changing values of abnormal features with the help of their semantically meaningful normal ranges.
Compared to existing methods, we conduct comprehensive experiments on both synthetic and real-world datasets to demonstrate that our method provides more robust explanations while preserving flexibility.
arXiv Detail & Related papers (2023-09-09T04:05:56Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.