Related papers: MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention

MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention

URL: http://arxiv.org/abs/2406.05344v1
Date: Sat, 8 Jun 2024 04:09:20 GMT
Title: MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention
Authors: Prince Jha, Raghav Jain, Konika Mandal, Aman Chadha, Sriparna Saha, Pushpak Bhattacharyya,
Abstract summary: We present textitMemeGuard, a comprehensive framework leveraging Large Language Models (LLMs) and Visual Language Models (VLMs) for meme intervention. textitMemeGuard harnesses a specially fine-tuned VLM, textitVLMeme, for meme interpretation, and a multimodal knowledge selection and ranking mechanism. We leverage textitICMM to test textitMemeGuard, demonstrating its proficiency in generating relevant and effective responses to toxic memes.
Score: 43.849634264271565
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the digital world, memes present a unique challenge for content moderation due to their potential to spread harmful content. Although detection methods have improved, proactive solutions such as intervention are still limited, with current research focusing mostly on text-based content, neglecting the widespread influence of multimodal content like memes. Addressing this gap, we present \textit{MemeGuard}, a comprehensive framework leveraging Large Language Models (LLMs) and Visual Language Models (VLMs) for meme intervention. \textit{MemeGuard} harnesses a specially fine-tuned VLM, \textit{VLMeme}, for meme interpretation, and a multimodal knowledge selection and ranking mechanism (\textit{MKS}) for distilling relevant knowledge. This knowledge is then employed by a general-purpose LLM to generate contextually appropriate interventions. Another key contribution of this work is the \textit{\textbf{I}ntervening} \textit{\textbf{C}yberbullying in \textbf{M}ultimodal \textbf{M}emes (ICMM)} dataset, a high-quality, labeled dataset featuring toxic memes and their corresponding human-annotated interventions. We leverage \textit{ICMM} to test \textit{MemeGuard}, demonstrating its proficiency in generating relevant and effective responses to toxic memes.

Related papers

Detecting Harmful Memes with Decoupled Understanding and Guided CoT Reasoning [26.546646866501735]
We introduce U-CoT+, a novel framework for harmful meme detection.<n>We first develop a high-fidelity meme-to-text pipeline that converts visual memes into detail-preserving textual descriptions.<n>This design decouples meme interpretation from meme classification, thus avoiding immediate reasoning over complex raw visual content.
arXiv Detail & Related papers (2025-06-10T06:10:45Z)
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation [88.78166077081912]
We introduce a multimodal unlearning benchmark, UnLOK-VQA, and an attack-and-defense framework to evaluate methods for deleting specific multimodal knowledge from MLLMs.<n>Our results show multimodal attacks outperform text- or image-only ones, and that the most effective defense removes answer information from internal model states.
arXiv Detail & Related papers (2025-05-01T01:54:00Z)
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models [12.929357709840975]
Multimodal memes are sometimes misused to disseminate hate speech against individuals or groups. We propose a definition-guided prompting technique for detecting hateful memes, and a unified framework for mitigating hateful content in memes, named UnHateMeme. Our framework, integrated with Vision-Language Models, demonstrates a strong capability to convert hateful memes into non-hateful forms.
arXiv Detail & Related papers (2025-04-30T19:48:12Z)
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions [4.649093665157263]
In this paper, we introduce IntMeme, a novel framework that leverages Large Multimodal Models (LMMs) for hateful meme classification with explainable decisions. IntMeme addresses the dual challenges of improving both accuracy and explainability in meme moderation. Our approach addresses the opacity and misclassification issues associated with PT-VLMs, optimizing the use of LMMs for hateful meme detection.
arXiv Detail & Related papers (2025-02-16T10:45:40Z)
Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search [64.15205542003056]
We introduce Attention-Guided Alignment (AGA) framework featuring two innovative components: Attention-Guided Mask (AGM) Modeling and Text Enrichment Module (TEM) AGA achieves new state-of-the-art results with Rank-1 accuracy reaching 78.36%, 67.31%, and 67.4% on CUHK-PEDES, ICFG-PEDES, and RSTP, respectively.
arXiv Detail & Related papers (2024-12-19T17:51:49Z)
TrojVLM: Backdoor Attack Against Vision Language Models [50.87239635292717]
This study introduces TrojVLM, the first exploration of backdoor attacks aimed at Vision Language Models (VLMs) TrojVLM inserts predetermined target text into output text when encountering poisoned images. A novel semantic preserving loss is proposed to ensure the semantic integrity of the original image content.
arXiv Detail & Related papers (2024-09-28T04:37:09Z)
CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration [90.36429361299807]
multimodal large language models (MLLMs) have demonstrated remarkable success in engaging in conversations involving visual inputs. The integration of visual modality has introduced a unique vulnerability: the MLLM becomes susceptible to malicious visual inputs. We introduce a technique termed CoCA, which amplifies the safety-awareness of the MLLM by calibrating its output distribution.
arXiv Detail & Related papers (2024-09-17T17:14:41Z)
HateSieve: A Contrastive Learning Framework for Detecting and Segmenting Hateful Content in Multimodal Memes [8.97062933976566]
textscHateSieve is a framework designed to enhance the detection and segmentation of hateful elements in memes. textscHateSieve features a novel Contrastive Meme Generator that creates semantically paired memes. Empirical experiments on the Hateful Meme show that textscHateSieve not only surpasses existing LMMs in performance with fewer trainable parameters but also offers a robust mechanism for precisely identifying and isolating hateful content.
arXiv Detail & Related papers (2024-08-11T14:56:06Z)
Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge [76.45868419402265]
multimodal large language models (MLLMs) have made significant strides by training on vast high-quality image-text datasets. However, the inherent difficulty in explicitly conveying fine-grained or spatially dense information in text, such as masks, poses a challenge for MLLMs. This paper proposes a new visual prompt approach to integrate fine-grained external knowledge, gleaned from specialized vision models, into MLLMs.
arXiv Detail & Related papers (2024-07-05T17:43:30Z)
MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing [53.30190591805432]
We introduce MemeMQA, a multimodal question-answering framework to solicit accurate responses to structured questions. We also propose ARSENAL, a novel two-stage multimodal framework to address MemeMQA.
arXiv Detail & Related papers (2024-05-18T07:44:41Z)
IITK at SemEval-2024 Task 4: Hierarchical Embeddings for Detection of Persuasion Techniques in Memes [4.679320772294786]
This paper proposes an ensemble of Class Definition Prediction (CDP) and hyperbolic embeddings-based approaches for this task. We enhance meme classification accuracy and comprehensiveness by integrating HypEmo's hierarchical label embeddings and a multi-task learning framework for emotion prediction.
arXiv Detail & Related papers (2024-04-06T06:28:02Z)
AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting [54.931241667414184]
We propose textbfAdaptive textbfShield Prompting, which prepends inputs with defense prompts to defend MLLMs against structure-based jailbreak attacks. Our methods can consistently improve MLLMs' robustness against structure-based jailbreak attacks.
arXiv Detail & Related papers (2024-03-14T15:57:13Z)
Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models [17.617187709968242]
Existing harmful meme detection approaches only recognize superficial harm-indicative signals in an end-to-end classification manner. We propose a novel generative framework to learn reasonable thoughts from Large Language Models for better multimodal fusion. Our proposed approach achieves superior performance than state-of-the-art methods on the harmful meme detection task.
arXiv Detail & Related papers (2023-12-09T01:59:11Z)
MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched Contextualization [31.209594252045566]
We propose a novel task, MEMEX, given a meme and a related document, the aim is to mine the context that succinctly explains the background of the meme. To benchmark MCC, we propose MIME, a multimodal neural framework that uses common sense enriched meme representation and a layered approach to capture the cross-modal semantic dependencies between the meme and the context.
arXiv Detail & Related papers (2023-05-25T10:19:35Z)
What do you MEME? Generating Explanations for Visual Semantic Role Labelling in Memes [42.357272117919464]
We introduce a novel task - EXCLAIM, generating explanations for visual semantic role labeling in memes. To this end, we curate ExHVV, a novel dataset that offers natural language explanations of connotative roles for three types of entities. We also posit LUMEN, a novel multimodal, multi-task learning framework that endeavors to address EXCLAIM optimally.
arXiv Detail & Related papers (2022-12-01T18:21:36Z)
Detecting and Understanding Harmful Memes: A Survey [48.135415967633676]
We offer a comprehensive survey with a focus on harmful memes. One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism. Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual.
arXiv Detail & Related papers (2022-05-09T13:43:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.