Detecting Hate Speech in Multi-modal Memes
- URL: http://arxiv.org/abs/2012.14891v1
- Date: Tue, 29 Dec 2020 18:30:00 GMT
- Title: Detecting Hate Speech in Multi-modal Memes
- Authors: Abhishek Das, Japsimar Singh Wahi, Siyao Li
- Abstract summary: We focus on hate speech detection in multi-modal memes wherein memes pose an interesting multi-modal fusion problem.
We aim to solve the Facebook Meme Challenge citekiela 2020hateful which aims to solve a binary classification problem of predicting whether a meme is hateful or not.
- Score: 14.036769355498546
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In the past few years, there has been a surge of interest in multi-modal
problems, from image captioning to visual question answering and beyond. In
this paper, we focus on hate speech detection in multi-modal memes wherein
memes pose an interesting multi-modal fusion problem. We aim to solve the
Facebook Meme Challenge \cite{kiela2020hateful} which aims to solve a binary
classification problem of predicting whether a meme is hateful or not. A
crucial characteristic of the challenge is that it includes "benign
confounders" to counter the possibility of models exploiting unimodal priors.
The challenge states that the state-of-the-art models perform poorly compared
to humans. During the analysis of the dataset, we realized that majority of the
data points which are originally hateful are turned into benign just be
describing the image of the meme. Also, majority of the multi-modal baselines
give more preference to the hate speech (language modality). To tackle these
problems, we explore the visual modality using object detection and image
captioning models to fetch the "actual caption" and then combine it with the
multi-modal representation to perform binary classification. This approach
tackles the benign text confounders present in the dataset to improve the
performance. Another approach we experiment with is to improve the prediction
with sentiment analysis. Instead of only using multi-modal representations
obtained from pre-trained neural networks, we also include the unimodal
sentiment to enrich the features. We perform a detailed analysis of the above
two approaches, providing compelling reasons in favor of the methodologies
used.
Related papers
- Towards Better Multi-modal Keyphrase Generation via Visual Entity
Enhancement and Multi-granularity Image Noise Filtering [79.44443231700201]
Multi-modal keyphrase generation aims to produce a set of keyphrases that represent the core points of the input text-image pair.
The input text and image are often not perfectly matched, and thus the image may introduce noise into the model.
We propose a novel multi-modal keyphrase generation model, which not only enriches the model input with external knowledge, but also effectively filters image noise.
arXiv Detail & Related papers (2023-09-09T09:41:36Z) - MemeFier: Dual-stage Modality Fusion for Image Meme Classification [8.794414326545697]
New forms of digital content such as image memes have given rise to spread of hate using multimodal means.
We propose MemeFier, a deep learning-based architecture for fine-grained classification of Internet image memes.
arXiv Detail & Related papers (2023-04-06T07:36:52Z) - Towards Unifying Medical Vision-and-Language Pre-training via Soft
Prompts [63.84720380390935]
There exist two typical types, textiti.e., the fusion-encoder type and the dual-encoder type, depending on whether a heavy fusion module is used.
We propose an effective yet straightforward scheme named PTUnifier to unify the two types.
We first unify the input format by introducing visual and textual prompts, which serve as a feature bank that stores the most representative images/texts.
arXiv Detail & Related papers (2023-02-17T15:43:42Z) - Benchmarking Robustness of Multimodal Image-Text Models under
Distribution Shift [50.64474103506595]
We investigate the robustness of 12 popular open-sourced image-text models under common perturbations on five tasks.
Character-level perturbations constitute the most severe distribution shift for text, and zoom blur is the most severe shift for image data.
arXiv Detail & Related papers (2022-12-15T18:52:03Z) - Caption Enriched Samples for Improving Hateful Memes Detection [78.5136090997431]
The hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not.
Both unimodal language models and multimodal vision-language models cannot reach the human level of performance.
arXiv Detail & Related papers (2021-09-22T10:57:51Z) - Exploiting BERT For Multimodal Target SentimentClassification Through
Input Space Translation [75.82110684355979]
We introduce a two-stream model that translates images in input space using an object-aware transformer.
We then leverage the translation to construct an auxiliary sentence that provides multimodal information to a language model.
We achieve state-of-the-art performance on two multimodal Twitter datasets.
arXiv Detail & Related papers (2021-08-03T18:02:38Z) - Deciphering Implicit Hate: Evaluating Automated Detection Algorithms for
Multimodal Hate [2.68137173219451]
This paper evaluates the role of semantic and multimodal context for detecting implicit and explicit hate.
We show that both text- and visual- enrichment improves model performance.
We find that all models perform better on content with full annotator agreement and that multimodal models are best at classifying the content where annotators disagree.
arXiv Detail & Related papers (2021-06-10T16:29:42Z) - A Multimodal Framework for the Detection of Hateful Memes [16.7604156703965]
We aim to develop a framework for the detection of hateful memes.
We show the effectiveness of upsampling of contrastive examples to encourage multimodality and ensemble learning.
Our best approach comprises an ensemble of UNITER-based models and achieves an AUROC score of 80.53, placing us 4th on phase 2 of the 2020 Hateful Memes Challenge organized by Facebook.
arXiv Detail & Related papers (2020-12-23T18:37:11Z) - The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes [43.778346545763654]
This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes.
It is constructed such that unimodal models struggle and only multimodal models can succeed.
We find that state-of-the-art methods perform poorly compared to humans.
arXiv Detail & Related papers (2020-05-10T21:31:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.