An Interpretable Approach to Hateful Meme Detection
- URL: http://arxiv.org/abs/2108.10069v1
- Date: Mon, 9 Aug 2021 18:28:56 GMT
- Title: An Interpretable Approach to Hateful Meme Detection
- Authors: Tanvi Deshpande and Nitya Mani
- Abstract summary: Hateful memes are an emerging method of spreading hate on the internet.
We take an interpretable approach to meme detection using machine learning.
We build a gradient-boosted decision tree and an LSTM-based model that achieve comparable performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Hateful memes are an emerging method of spreading hate on the internet,
relying on both images and text to convey a hateful message. We take an
interpretable approach to hateful meme detection, using machine learning and
simple heuristics to identify the features most important to classifying a meme
as hateful. In the process, we build a gradient-boosted decision tree and an
LSTM-based model that achieve comparable performance (73.8 validation and 72.7
test auROC) to the gold standard of humans and state-of-the-art transformer
models on this challenging task.
Related papers
- XMeCap: Meme Caption Generation with Sub-Image Adaptability [53.2509590113364]
Humor, deeply rooted in societal meanings and cultural details, poses a unique challenge for machines.
We introduce the textscXMeCap framework, which adopts supervised fine-tuning and reinforcement learning.
textscXMeCap achieves an average evaluation score of 75.85 for single-image memes and 66.32 for multi-image memes, outperforming the best baseline by 3.71% and 4.82%, respectively.
arXiv Detail & Related papers (2024-07-24T10:51:46Z) - Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes
Through Multimodal Explanations [48.82168723932981]
We introduce em MultiBully-Ex, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes.
A Contrastive Language-Image Pretraining (CLIP) approach has been proposed for visual and textual explanation of a meme.
arXiv Detail & Related papers (2024-01-18T11:24:30Z) - Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning
Distilled from Large Language Models [17.617187709968242]
Existing harmful meme detection approaches only recognize superficial harm-indicative signals in an end-to-end classification manner.
We propose a novel generative framework to learn reasonable thoughts from Large Language Models for better multimodal fusion.
Our proposed approach achieves superior performance than state-of-the-art methods on the harmful meme detection task.
arXiv Detail & Related papers (2023-12-09T01:59:11Z) - MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning [59.988458964353754]
Text-to-image diffusion models allow seamless generation of personalized images from scant reference photos.
Existing approaches perturb user images in imperceptible way to render them "unlearnable" from malicious uses.
We propose MetaCloak, which solves the bi-level poisoning problem with a meta-learning framework.
arXiv Detail & Related papers (2023-11-22T03:31:31Z) - Improving Hateful Meme Detection through Retrieval-Guided Contrastive Learning [13.690436954062015]
We propose constructing a hatefulness-aware embedding space through retrieval-guided contrastive training.
Our approach achieves state-of-the-art performance on the HatefulMemes dataset with an AUROC of 87.0, outperforming much larger fine-tuned large multimodal models.
arXiv Detail & Related papers (2023-11-14T12:14:54Z) - Decoding the Underlying Meaning of Multimodal Hateful Memes [4.509263496823139]
This paper introduces Hateful meme with Reasons dataset (HatReD)
HatReD is a new multimodal hateful meme dataset annotated with the underlying hateful contextual reasons.
We also define a new conditional generation task that aims to automatically generate underlying reasons to explain hateful memes.
arXiv Detail & Related papers (2023-05-28T10:02:59Z) - StraIT: Non-autoregressive Generation with Stratified Image Transformer [63.158996766036736]
Stratified Image Transformer(StraIT) is a pure non-autoregressive(NAR) generative model.
Our experiments demonstrate that StraIT significantly improves NAR generation and out-performs existing DMs and AR methods.
arXiv Detail & Related papers (2023-03-01T18:59:33Z) - Hate-CLIPper: Multimodal Hateful Meme Classification based on
Cross-modal Interaction of CLIP Features [5.443781798915199]
Hateful memes are a growing menace on social media.
detecting hateful memes requires careful consideration of both visual and textual information.
We propose the Hate-CLIPper architecture, which explicitly models the cross-modal interactions between the image and text representations.
arXiv Detail & Related papers (2022-10-12T04:34:54Z) - Exploring The Role of Mean Teachers in Self-supervised Masked
Auto-Encoders [64.03000385267339]
Masked image modeling (MIM) has become a popular strategy for self-supervised learning(SSL) of visual representations with Vision Transformers.
We present a simple SSL method, the Reconstruction-Consistent Masked Auto-Encoder (RC-MAE) by adding an EMA teacher to MAE.
RC-MAE converges faster and requires less memory usage than state-of-the-art self-distillation methods during pre-training.
arXiv Detail & Related papers (2022-10-05T08:08:55Z) - Caption Enriched Samples for Improving Hateful Memes Detection [78.5136090997431]
The hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not.
Both unimodal language models and multimodal vision-language models cannot reach the human level of performance.
arXiv Detail & Related papers (2021-09-22T10:57:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.