From Shallow Humor to Metaphor: Towards Label-Free Harmful Meme Detection via LMM Agent Self-Improvement
- URL: http://arxiv.org/abs/2512.21598v1
- Date: Thu, 25 Dec 2025 09:36:35 GMT
- Title: From Shallow Humor to Metaphor: Towards Label-Free Harmful Meme Detection via LMM Agent Self-Improvement
- Authors: Jian Lang, Rongpei Hong, Ting Zhong, Leiting Chen, Qiang Gao, Fan Zhou,
- Abstract summary: The proliferation of harmful memes on online media poses significant risks to public health and stability.<n>Existing detection methods heavily rely on large-scale labeled data for training.<n>We present ALARM, the first lAbeL-free hARmful Meme detection framework powered by Large Multimodal Model (LMM) agent self-improvement.
- Score: 32.18826266751766
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The proliferation of harmful memes on online media poses significant risks to public health and stability. Existing detection methods heavily rely on large-scale labeled data for training, which necessitates substantial manual annotation efforts and limits their adaptability to the continually evolving nature of harmful content. To address these challenges, we present ALARM, the first lAbeL-free hARmful Meme detection framework powered by Large Multimodal Model (LMM) agent self-improvement. The core innovation of ALARM lies in exploiting the expressive information from "shallow" memes to iteratively enhance its ability to tackle more complex and subtle ones. ALARM consists of a novel Confidence-based Explicit Meme Identification mechanism that isolates the explicit memes from the original dataset and assigns them pseudo-labels. Besides, a new Pairwise Learning Guided Agent Self-Improvement paradigm is introduced, where the explicit memes are reorganized into contrastive pairs (positive vs. negative) to refine a learner LMM agent. This agent autonomously derives high-level detection cues from these pairs, which in turn empower the agent itself to handle complex and challenging memes effectively. Experiments on three diverse datasets demonstrate the superior performance and strong adaptability of ALARM to newly evolved memes. Notably, our method even outperforms label-driven methods. These results highlight the potential of label-free frameworks as a scalable and promising solution for adapting to novel forms and topics of harmful memes in dynamic online environments.
Related papers
- Self-Consolidation for Self-Evolving Agents [51.94826934403236]
Large language model (LLM) agents operate as static systems, lacking the ability to evolve through lifelong interaction.<n>We propose a novel self-evolving framework for LLM agents that introduces a complementary evolution mechanism.
arXiv Detail & Related papers (2026-02-02T11:16:07Z) - SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models [67.84174763413178]
We introduce SafeRedir, a lightweight inference-time framework for robust unlearning via prompt embedding redirection.<n>We show that SafeRedir achieves effective unlearning capability, high semantic and perceptual preservation, robust image quality, and enhanced resistance to adversarial attacks.
arXiv Detail & Related papers (2026-01-13T15:01:38Z) - Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails [103.05296856071931]
We identify the Alignment Tipping Process (ATP), a critical post-deployment risk unique to self-evolving Large Language Model (LLM) agents.<n>ATP arises when continual interaction drives agents to abandon alignment constraints established during training in favor of reinforced, self-interested strategies.<n>Our experiments show that alignment benefits erode rapidly under self-evolution, with initially aligned models converging toward unaligned states.
arXiv Detail & Related papers (2025-10-06T14:48:39Z) - Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents [58.69865074060139]
We study the case where an agent's self-evolution deviates in unintended ways, leading to undesirable or even harmful outcomes.<n>Our empirical findings reveal that misevolution is a widespread risk, affecting agents built even on top-tier LLMs.<n>We discuss potential mitigation strategies to inspire further research on building safer and more trustworthy self-evolving agents.
arXiv Detail & Related papers (2025-09-30T14:55:55Z) - MemeMind: A Large-Scale Multimodal Dataset with Chain-of-Thought Reasoning for Harmful Meme Detection [4.09109557328609]
Harmful memes pose significant challenges for automated detection due to implicit semantics and complex multimodal interactions.<n>MemeMind is a novel dataset featuring scientifically rigorous standards, large scale, diversity, bilingual support (Chinese and English), and detailed Chain-of-Thought (CoT) annotations.<n>We propose an innovative detection framework, MemeGuard, which effectively integrates multimodal information with reasoning process modeling.
arXiv Detail & Related papers (2025-06-15T13:45:30Z) - Detecting Harmful Memes with Decoupled Understanding and Guided CoT Reasoning [26.546646866501735]
We introduce U-CoT+, a novel framework for harmful meme detection.<n>We first develop a high-fidelity meme-to-text pipeline that converts visual memes into detail-preserving textual descriptions.<n>This design decouples meme interpretation from meme classification, thus avoiding immediate reasoning over complex raw visual content.
arXiv Detail & Related papers (2025-06-10T06:10:45Z) - Information Retrieval Induced Safety Degradation in AI Agents [52.15553901577888]
This study investigates how expanding retrieval access affects model reliability, bias propagation, and harmful content generation.<n>Retrieval-enabled agents built on aligned LLMs often behave more unsafely than uncensored models without retrieval.<n>These findings underscore the need for robust mitigation strategies to ensure fairness and reliability in retrieval-enabled and increasingly autonomous AI systems.
arXiv Detail & Related papers (2025-05-20T11:21:40Z) - MemeIntel: Explainable Detection of Propagandistic and Hateful Memes [7.312435036698118]
We introduce MemeXplain, an explanation-enhanced dataset for propagandistic memes in Arabic and hateful memes in English.<n>We propose a multi-stage optimization approach and train Vision-Language Models (VLMs)<n>Our results show that this strategy significantly improves both label detection and explanation generation quality over the base model.
arXiv Detail & Related papers (2025-02-23T15:35:48Z) - Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions [4.649093665157263]
In this paper, we introduce IntMeme, a novel framework that leverages Large Multimodal Models (LMMs) for hateful meme classification with explainable decisions.<n>IntMeme addresses the dual challenges of improving both accuracy and explainability in meme moderation.<n>Our approach addresses the opacity and misclassification issues associated with PT-VLMs, optimizing the use of LMMs for hateful meme detection.
arXiv Detail & Related papers (2025-02-16T10:45:40Z) - Towards Low-Resource Harmful Meme Detection with LMM Agents [13.688955830843973]
We propose an agency-driven framework for low-resource harmful meme detection.
We first retrieve relative memes with annotations to leverage label information as auxiliary signals for the LMM agent.
We elicit knowledge-revising behavior within the LMM agent to derive well-generalized insights into meme harmfulness.
arXiv Detail & Related papers (2024-11-08T07:43:15Z) - Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection [49.122777764853055]
We explore the potential of Large Multimodal Models (LMMs) for hateful meme detection.<n>We propose Evolver, which incorporates LMMs via Chain-of-Evolution (CoE) Prompting.<n>Evolver simulates the evolving and expressing process of memes and reasons through LMMs in a step-by-step manner.
arXiv Detail & Related papers (2024-07-30T17:51:44Z) - MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing [53.30190591805432]
We introduce MemeMQA, a multimodal question-answering framework to solicit accurate responses to structured questions.
We also propose ARSENAL, a novel two-stage multimodal framework to address MemeMQA.
arXiv Detail & Related papers (2024-05-18T07:44:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.