Related papers: Towards Low-Resource Harmful Meme Detection with LMM Agents

Towards Low-Resource Harmful Meme Detection with LMM Agents

URL: http://arxiv.org/abs/2411.05383v1
Date: Fri, 08 Nov 2024 07:43:15 GMT
Title: Towards Low-Resource Harmful Meme Detection with LMM Agents
Authors: Jianzhao Huang, Hongzhan Lin, Ziyan Liu, Ziyang Luo, Guang Chen, Jing Ma,
Abstract summary: We propose an agency-driven framework for low-resource harmful meme detection. We first retrieve relative memes with annotations to leverage label information as auxiliary signals for the LMM agent. We elicit knowledge-revising behavior within the LMM agent to derive well-generalized insights into meme harmfulness.
Score: 13.688955830843973
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The proliferation of Internet memes in the age of social media necessitates effective identification of harmful ones. Due to the dynamic nature of memes, existing data-driven models may struggle in low-resource scenarios where only a few labeled examples are available. In this paper, we propose an agency-driven framework for low-resource harmful meme detection, employing both outward and inward analysis with few-shot annotated samples. Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first retrieve relative memes with annotations to leverage label information as auxiliary signals for the LMM agent. Then, we elicit knowledge-revising behavior within the LMM agent to derive well-generalized insights into meme harmfulness. By combining these strategies, our approach enables dialectical reasoning over intricate and implicit harm-indicative patterns. Extensive experiments conducted on three meme datasets demonstrate that our proposed approach achieves superior performance than state-of-the-art methods on the low-resource harmful meme detection task.

Related papers

SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models [50.34706204154244]
Acquiring reasoning capabilities catastrophically degrades inherited safety alignment. Certain scenarios suffer 25 times higher attack rates. Despite tight reasoning-answer safety coupling, MLRMs demonstrate nascent self-correction.
arXiv Detail & Related papers (2025-04-09T06:53:23Z)
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions [4.649093665157263]
In this paper, we introduce IntMeme, a novel framework that leverages Large Multimodal Models (LMMs) for hateful meme classification with explainable decisions. IntMeme addresses the dual challenges of improving both accuracy and explainability in meme moderation. Our approach addresses the opacity and misclassification issues associated with PT-VLMs, optimizing the use of LMMs for hateful meme detection.
arXiv Detail & Related papers (2025-02-16T10:45:40Z)
Re-ranking Using Large Language Models for Mitigating Exposure to Harmful Content on Social Media Platforms [10.421660174482314]
We propose a novel re-ranking approach using Large Language Models (LLMs) in zero-shot and few-shot settings. Our method dynamically assesses and re-ranks content sequences, effectively mitigating harmful content exposure without requiring extensive labeled data.
arXiv Detail & Related papers (2025-01-23T00:26:32Z)
Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM) To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z)
Alignment-Aware Model Extraction Attacks on Large Language Models [23.79690793366511]
We present Locality Reinforced Distillation (LoRD), a novel model extraction attack algorithm specifically for large language models (LLMs) In particular, we design a policy-gradient-style training task, which utilizes victim models' responses as a signal to guide the crafting of preference for the local model. LoRD can reduce query complexity while mitigating watermark protection through exploration-based stealing.
arXiv Detail & Related papers (2024-09-04T13:54:38Z)
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning [68.83624133567213]
We show that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question. We also propose a simple yet effective method, Active Deduction (AD), to encourage the model to actively perform composite deduction.
arXiv Detail & Related papers (2024-04-19T15:53:27Z)
The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative [55.08395463562242]
Multimodal Large Language Models (MLLMs) are constantly defining the new boundary of Artificial General Intelligence (AGI) Our paper explores a novel vulnerability in MLLM societies - the indirect propagation of malicious content.
arXiv Detail & Related papers (2024-02-20T23:08:21Z)
Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacks [0.6282171844772422]
An increasing depth of parametric domain knowledge in large language models (LLMs) is fueling their rapid deployment in real-world applications. The recent discovery of named entities as adversarial examples in natural language processing tasks raises questions about their potential impact on the knowledge robustness of pre-trained and finetuned LLMs. We developed an embedding-space attack based on powerscaled distance-weighted sampling to assess the robustness of their biomedical knowledge.
arXiv Detail & Related papers (2024-02-16T09:29:38Z)
Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models [18.181154544563416]
The age of social media is flooded with Internet memes, necessitating a clear grasp and effective identification of harmful ones. Existing harmful meme detection methods do not present readable explanations that unveil such implicit meaning to support their detection decisions. We propose an explainable approach to detect harmful memes, achieved through reasoning over conflicting rationales from both harmless and harmful positions.
arXiv Detail & Related papers (2024-01-24T08:37:16Z)
Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions. We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z)
Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models [17.617187709968242]
Existing harmful meme detection approaches only recognize superficial harm-indicative signals in an end-to-end classification manner. We propose a novel generative framework to learn reasonable thoughts from Large Language Models for better multimodal fusion. Our proposed approach achieves superior performance than state-of-the-art methods on the harmful meme detection task.
arXiv Detail & Related papers (2023-12-09T01:59:11Z)
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images. We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy. Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z)
Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information. By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples. We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.