GOAT-Bench: Safety Insights to Large Multimodal Models through
Meme-Based Social Abuse
- URL: http://arxiv.org/abs/2401.01523v3
- Date: Fri, 1 Mar 2024 05:26:39 GMT
- Title: GOAT-Bench: Safety Insights to Large Multimodal Models through
Meme-Based Social Abuse
- Authors: Hongzhan Lin, Ziyang Luo, Bo Wang, Ruichao Yang and Jing Ma
- Abstract summary: We introduce the comprehensive meme benchmark, GOAT-Bench, comprising over 6K varied memes encapsulating themes such as implicit hate speech, cyberbullying, and sexism, etc.
We delve into the ability of LMMs to accurately assess hatefulness, misogyny, offensiveness, sarcasm, and harmful content.
Our extensive experiments across a range of LMMs reveal that current models still exhibit a deficiency in safety awareness, showing insensitivity to various forms of implicit abuse.
- Score: 15.632755242069729
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The exponential growth of social media has profoundly transformed how
information is created, disseminated, and absorbed, exceeding any precedent in
the digital age. Regrettably, this explosion has also spawned a significant
increase in the online abuse of memes. Evaluating the negative impact of memes
is notably challenging, owing to their often subtle and implicit meanings,
which are not directly conveyed through the overt text and imagery. In light of
this, large multimodal models (LMMs) have emerged as a focal point of interest
due to their remarkable capabilities in handling diverse multimodal tasks. In
response to this development, our paper aims to thoroughly examine the capacity
of various LMMs (e.g., GPT-4V) to discern and respond to the nuanced aspects of
social abuse manifested in memes. We introduce the comprehensive meme
benchmark, GOAT-Bench, comprising over 6K varied memes encapsulating themes
such as implicit hate speech, sexism, and cyberbullying, etc. Utilizing
GOAT-Bench, we delve into the ability of LMMs to accurately assess hatefulness,
misogyny, offensiveness, sarcasm, and harmful content. Our extensive
experiments across a range of LMMs reveal that current models still exhibit a
deficiency in safety awareness, showing insensitivity to various forms of
implicit abuse. We posit that this shortfall represents a critical impediment
to the realization of safe artificial intelligence. The GOAT-Bench and
accompanying resources are publicly accessible at https://goatlmm.github.io/,
contributing to ongoing research in this vital field.
Related papers
- Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection [49.122777764853055]
We explore the potential of Large Multimodal Models (LMMs) for hateful meme detection.
We propose Evolver, which incorporates LMMs via Chain-of-Evolution (CoE) Prompting.
Evolver simulates the evolving and expressing process of memes and reasons through LMMs in a step-by-step manner.
arXiv Detail & Related papers (2024-07-30T17:51:44Z) - MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries? [70.77691645678804]
Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli.
This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies.
We identify three types of stimuli that trigger the oversensitivity of existing MLLMs: Exaggerated Risk, Negated Harm, and Counterintuitive.
arXiv Detail & Related papers (2024-06-22T23:26:07Z) - Towards Explainable Harmful Meme Detection through Multimodal Debate
between Large Language Models [18.181154544563416]
The age of social media is flooded with Internet memes, necessitating a clear grasp and effective identification of harmful ones.
Existing harmful meme detection methods do not present readable explanations that unveil such implicit meaning to support their detection decisions.
We propose an explainable approach to detect harmful memes, achieved through reasoning over conflicting rationales from both harmless and harmful positions.
arXiv Detail & Related papers (2024-01-24T08:37:16Z) - Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes
Through Multimodal Explanations [48.82168723932981]
We introduce em MultiBully-Ex, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes.
A Contrastive Language-Image Pretraining (CLIP) approach has been proposed for visual and textual explanation of a meme.
arXiv Detail & Related papers (2024-01-18T11:24:30Z) - Characterizing the Entities in Harmful Memes: Who is the Hero, the
Villain, the Victim? [39.55435707149863]
We aim to understand whether the meme glorifies, vilifies, or victimizes each entity it refers to.
Our proposed model achieves an improvement of 4% over the best baseline and 1% over the best competing stand-alone submission.
arXiv Detail & Related papers (2023-01-26T16:55:15Z) - DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally
Spreading Out Disinformation [72.18912216025029]
We present DisinfoMeme to help detect disinformation memes.
The dataset contains memes mined from Reddit covering three current topics: the COVID-19 pandemic, the Black Lives Matter movement, and veganism/vegetarianism.
arXiv Detail & Related papers (2022-05-25T09:54:59Z) - DISARM: Detecting the Victims Targeted by Harmful Memes [49.12165815990115]
DISARM is a framework that uses named entity recognition and person identification to detect harmful memes.
We show that DISARM significantly outperforms ten unimodal and multimodal systems.
It can reduce the relative error rate for harmful target identification by up to 9 points absolute over several strong multimodal rivals.
arXiv Detail & Related papers (2022-05-11T19:14:26Z) - Detecting and Understanding Harmful Memes: A Survey [48.135415967633676]
We offer a comprehensive survey with a focus on harmful memes.
One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism.
Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual.
arXiv Detail & Related papers (2022-05-09T13:43:27Z) - Detecting Harmful Memes and Their Targets [27.25262711136056]
We present HarMeme, the first benchmark dataset, containing 3,544 memes related to COVID-19.
In the first stage, we labeled a meme as very harmful, partially harmful, or harmless; in the second stage, we further annotated the type of target(s) that each harmful meme points to.
The evaluation results using ten unimodal and multimodal models highlight the importance of using multimodal signals for both tasks.
arXiv Detail & Related papers (2021-09-24T17:11:42Z) - MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their
Targets [28.877314859737197]
We aim to solve two novel tasks: detecting harmful memes and identifying the social entities they target.
In particular, we aim to solve two novel tasks: detecting harmful memes and identifying the social entities they target.
We propose MOMENTA, a novel multimodal (text + image) deep neural model, which uses global and local perspectives to detect harmful memes.
arXiv Detail & Related papers (2021-09-11T04:29:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.