Towards Explainable Harmful Meme Detection through Multimodal Debate
between Large Language Models
- URL: http://arxiv.org/abs/2401.13298v1
- Date: Wed, 24 Jan 2024 08:37:16 GMT
- Title: Towards Explainable Harmful Meme Detection through Multimodal Debate
between Large Language Models
- Authors: Hongzhan Lin, Ziyang Luo, Wei Gao, Jing Ma, Bo Wang, Ruichao Yang
- Abstract summary: The age of social media is flooded with Internet memes, necessitating a clear grasp and effective identification of harmful ones.
Existing harmful meme detection methods do not present readable explanations that unveil such implicit meaning to support their detection decisions.
We propose an explainable approach to detect harmful memes, achieved through reasoning over conflicting rationales from both harmless and harmful positions.
- Score: 18.181154544563416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The age of social media is flooded with Internet memes, necessitating a clear
grasp and effective identification of harmful ones. This task presents a
significant challenge due to the implicit meaning embedded in memes, which is
not explicitly conveyed through the surface text and image. However, existing
harmful meme detection methods do not present readable explanations that unveil
such implicit meaning to support their detection decisions. In this paper, we
propose an explainable approach to detect harmful memes, achieved through
reasoning over conflicting rationales from both harmless and harmful positions.
Specifically, inspired by the powerful capacity of Large Language Models (LLMs)
on text generation and reasoning, we first elicit multimodal debate between
LLMs to generate the explanations derived from the contradictory arguments.
Then we propose to fine-tune a small language model as the debate judge for
harmfulness inference, to facilitate multimodal fusion between the harmfulness
rationales and the intrinsic multimodal information within memes. In this way,
our model is empowered to perform dialectical reasoning over intricate and
implicit harm-indicative patterns, utilizing multimodal explanations
originating from both harmless and harmful arguments. Extensive experiments on
three public meme datasets demonstrate that our harmful meme detection approach
achieves much better performance than state-of-the-art methods and exhibits a
superior capacity for explaining the meme harmfulness of the model predictions.
Related papers
- Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights [50.89022445197919]
We propose a speech-specific risk taxonomy, covering 8 risk categories under hostility (malicious sarcasm and threats), malicious imitation (age, gender, ethnicity), and stereotypical biases (age, gender, ethnicity)
Based on the taxonomy, we create a small-scale dataset for evaluating current LMMs capability in detecting these categories of risk.
arXiv Detail & Related papers (2024-06-25T10:08:45Z) - MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing [53.30190591805432]
We introduce MemeMQA, a multimodal question-answering framework to solicit accurate responses to structured questions.
We also propose ARSENAL, a novel two-stage multimodal framework to address MemeMQA.
arXiv Detail & Related papers (2024-05-18T07:44:41Z) - Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models [58.065255696601604]
We use compositional property of diffusion models, which allows to leverage multiple prompts in a single image generation.
We argue that it is essential to consider all possible approaches to image generation with diffusion models that can be employed by an adversary.
arXiv Detail & Related papers (2024-04-21T16:35:16Z) - Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes
Through Multimodal Explanations [48.82168723932981]
We introduce em MultiBully-Ex, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes.
A Contrastive Language-Image Pretraining (CLIP) approach has been proposed for visual and textual explanation of a meme.
arXiv Detail & Related papers (2024-01-18T11:24:30Z) - Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning
Distilled from Large Language Models [17.617187709968242]
Existing harmful meme detection approaches only recognize superficial harm-indicative signals in an end-to-end classification manner.
We propose a novel generative framework to learn reasonable thoughts from Large Language Models for better multimodal fusion.
Our proposed approach achieves superior performance than state-of-the-art methods on the harmful meme detection task.
arXiv Detail & Related papers (2023-12-09T01:59:11Z) - Detecting and Understanding Harmful Memes: A Survey [48.135415967633676]
We offer a comprehensive survey with a focus on harmful memes.
One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism.
Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual.
arXiv Detail & Related papers (2022-05-09T13:43:27Z) - Detecting Harmful Memes and Their Targets [27.25262711136056]
We present HarMeme, the first benchmark dataset, containing 3,544 memes related to COVID-19.
In the first stage, we labeled a meme as very harmful, partially harmful, or harmless; in the second stage, we further annotated the type of target(s) that each harmful meme points to.
The evaluation results using ten unimodal and multimodal models highlight the importance of using multimodal signals for both tasks.
arXiv Detail & Related papers (2021-09-24T17:11:42Z) - Caption Enriched Samples for Improving Hateful Memes Detection [78.5136090997431]
The hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not.
Both unimodal language models and multimodal vision-language models cannot reach the human level of performance.
arXiv Detail & Related papers (2021-09-22T10:57:51Z) - MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their
Targets [28.877314859737197]
We aim to solve two novel tasks: detecting harmful memes and identifying the social entities they target.
In particular, we aim to solve two novel tasks: detecting harmful memes and identifying the social entities they target.
We propose MOMENTA, a novel multimodal (text + image) deep neural model, which uses global and local perspectives to detect harmful memes.
arXiv Detail & Related papers (2021-09-11T04:29:32Z) - Detecting Hate Speech in Multi-modal Memes [14.036769355498546]
We focus on hate speech detection in multi-modal memes wherein memes pose an interesting multi-modal fusion problem.
We aim to solve the Facebook Meme Challenge citekiela 2020hateful which aims to solve a binary classification problem of predicting whether a meme is hateful or not.
arXiv Detail & Related papers (2020-12-29T18:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.