Related papers: Exploring the Limits of Zero Shot Vision Language Models for Hate Meme Detection: The Vulnerabilities and their Interpretations

Exploring the Limits of Zero Shot Vision Language Models for Hate Meme Detection: The Vulnerabilities and their Interpretations

URL: http://arxiv.org/abs/2402.12198v3
Date: Sun, 23 Mar 2025 04:26:00 GMT
Title: Exploring the Limits of Zero Shot Vision Language Models for Hate Meme Detection: The Vulnerabilities and their Interpretations
Authors: Naquee Rizwan, Paramananda Bhaskar, Mithun Das, Swadhin Satyaprakash Majhi, Punyajoy Saha, Animesh Mukherjee,
Abstract summary: We study the effectiveness of modern day vision language models (VLMs) in handling intricate tasks such as hate meme detection.<n>We perform thorough prompt engineering and query state-of-the-art VLMs using various prompt types to detect hateful/harmful memes.
Score: 9.970031080934003
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There is a rapid increase in the use of multimedia content in current social media platforms. One of the highly popular forms of such multimedia content are memes. While memes have been primarily invented to promote funny and buoyant discussions, malevolent users exploit memes to target individuals or vulnerable communities, making it imperative to identify and address such instances of hateful memes. Thus social media platforms are in dire need for active moderation of such harmful content. While manual moderation is extremely difficult due to the scale of such content, automatic moderation is challenged by the need of good quality annotated data to train hate meme detection algorithms. This makes a perfect pretext for exploring the power of modern day vision language models (VLMs) that have exhibited outstanding performance across various tasks. In this paper we study the effectiveness of VLMs in handling intricate tasks such as hate meme detection in a completely zero-shot setting so that there is no dependency on annotated data for the task. We perform thorough prompt engineering and query state-of-the-art VLMs using various prompt types to detect hateful/harmful memes. We further interpret the misclassification cases using a novel superpixel based occlusion method. Finally we show that these misclassifications can be neatly arranged into a typology of error classes the knowledge of which should enable the design of better safety guardrails in future.

Related papers

Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models [12.929357709840975]
Multimodal memes are sometimes misused to disseminate hate speech against individuals or groups. We propose a definition-guided prompting technique for detecting hateful memes, and a unified framework for mitigating hateful content in memes, named UnHateMeme. Our framework, integrated with Vision-Language Models, demonstrates a strong capability to convert hateful memes into non-hateful forms.
arXiv Detail & Related papers (2025-04-30T19:48:12Z)
Improving Multimodal Hateful Meme Detection Exploiting LMM-Generated Knowledge [11.801596051153725]
detecting hateful content in memes has emerged as a task of critical importance. We propose to address the task leveraging knowledge encoded in powerful Large Multimodal Models (LMM) Specifically, we propose to exploit LMMs in a two-fold manner. First, by extracting knowledge oriented to the hateful meme detection task in order to build strong meme representations.
arXiv Detail & Related papers (2025-04-14T06:23:44Z)
Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations [48.82168723932981]
We introduce em MultiBully-Ex, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes. A Contrastive Language-Image Pretraining (CLIP) approach has been proposed for visual and textual explanation of a meme.
arXiv Detail & Related papers (2024-01-18T11:24:30Z)
GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse [14.571295331012331]
We introduce the comprehensive meme benchmark, GOAT-Bench, comprising over 6K varied memes encapsulating themes such as implicit hate speech, cyberbullying, and sexism, etc. We delve into the ability of LMMs to accurately assess hatefulness, misogyny, offensiveness, sarcasm, and harmful content. Our extensive experiments across a range of LMMs reveal that current models still exhibit a deficiency in safety awareness, showing insensitivity to various forms of implicit abuse.
arXiv Detail & Related papers (2024-01-03T03:28:55Z)
Improving Hateful Meme Detection through Retrieval-Guided Contrastive Learning [13.690436954062015]
We propose constructing a hatefulness-aware embedding space through retrieval-guided contrastive training. Our approach achieves state-of-the-art performance on the HatefulMemes dataset with an AUROC of 87.0, outperforming much larger fine-tuned large multimodal models.
arXiv Detail & Related papers (2023-11-14T12:14:54Z)
A Template Is All You Meme [83.05919383106715]
We release a knowledge base of memes and information found on www.knowyourmeme.com, composed of more than 54,000 images. We hypothesize that meme templates can be used to inject models with the context missing from previous approaches.
arXiv Detail & Related papers (2023-11-11T19:38:14Z)
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation [72.18912216025029]
We present DisinfoMeme to help detect disinformation memes. The dataset contains memes mined from Reddit covering three current topics: the COVID-19 pandemic, the Black Lives Matter movement, and veganism/vegetarianism.
arXiv Detail & Related papers (2022-05-25T09:54:59Z)
DISARM: Detecting the Victims Targeted by Harmful Memes [49.12165815990115]
DISARM is a framework that uses named entity recognition and person identification to detect harmful memes. We show that DISARM significantly outperforms ten unimodal and multimodal systems. It can reduce the relative error rate for harmful target identification by up to 9 points absolute over several strong multimodal rivals.
arXiv Detail & Related papers (2022-05-11T19:14:26Z)
Detecting and Understanding Harmful Memes: A Survey [48.135415967633676]
We offer a comprehensive survey with a focus on harmful memes. One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism. Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual.
arXiv Detail & Related papers (2022-05-09T13:43:27Z)
Feels Bad Man: Dissecting Automated Hateful Meme Detection Through the Lens of Facebook's Challenge [10.775419935941008]
We assess the efficacy of current state-of-the-art multimodal machine learning models toward hateful meme detection. We use two benchmark datasets comprising 12,140 and 10,567 images from 4chan's "Politically Incorrect" board (/pol/) and Facebook's Hateful Memes Challenge dataset. We conduct three experiments to determine the importance of multimodality on classification performance, the influential capacity of fringe Web communities on mainstream social platforms and vice versa.
arXiv Detail & Related papers (2022-02-17T07:52:22Z)
Detecting Harmful Memes and Their Targets [27.25262711136056]
We present HarMeme, the first benchmark dataset, containing 3,544 memes related to COVID-19. In the first stage, we labeled a meme as very harmful, partially harmful, or harmless; in the second stage, we further annotated the type of target(s) that each harmful meme points to. The evaluation results using ten unimodal and multimodal models highlight the importance of using multimodal signals for both tasks.
arXiv Detail & Related papers (2021-09-24T17:11:42Z)
Caption Enriched Samples for Improving Hateful Memes Detection [78.5136090997431]
The hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not. Both unimodal language models and multimodal vision-language models cannot reach the human level of performance.
arXiv Detail & Related papers (2021-09-22T10:57:51Z)
MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets [28.877314859737197]
We aim to solve two novel tasks: detecting harmful memes and identifying the social entities they target. In particular, we aim to solve two novel tasks: detecting harmful memes and identifying the social entities they target. We propose MOMENTA, a novel multimodal (text + image) deep neural model, which uses global and local perspectives to detect harmful memes.
arXiv Detail & Related papers (2021-09-11T04:29:32Z)
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset [47.65948529524281]
We collect hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset. We find that memes in the wild differ in two key aspects: 1) Captions must be extracted via OCR, and 2) Memes are more diverse than traditional memes', including screenshots of conversations or text on a plain background.
arXiv Detail & Related papers (2021-07-09T09:04:05Z)
Multimodal Learning for Hateful Memes Detection [6.6881085567421605]
We propose a novel method that incorporates the image captioning process into the memes detection process. Our model achieves promising results on the Hateful Memes Detection Challenge.
arXiv Detail & Related papers (2020-11-25T16:49:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.