A Multimodal Framework for the Detection of Hateful Memes
- URL: http://arxiv.org/abs/2012.12871v2
- Date: Thu, 24 Dec 2020 14:28:17 GMT
- Title: A Multimodal Framework for the Detection of Hateful Memes
- Authors: Phillip Lippe, Nithin Holla, Shantanu Chandra, Santhosh Rajamanickam,
Georgios Antoniou, Ekaterina Shutova, Helen Yannakoudakis
- Abstract summary: We aim to develop a framework for the detection of hateful memes.
We show the effectiveness of upsampling of contrastive examples to encourage multimodality and ensemble learning.
Our best approach comprises an ensemble of UNITER-based models and achieves an AUROC score of 80.53, placing us 4th on phase 2 of the 2020 Hateful Memes Challenge organized by Facebook.
- Score: 16.7604156703965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An increasingly common expression of online hate speech is multimodal in
nature and comes in the form of memes. Designing systems to automatically
detect hateful content is of paramount importance if we are to mitigate its
undesirable effects on the society at large. The detection of multimodal hate
speech is an intrinsically difficult and open problem: memes convey a message
using both images and text and, hence, require multimodal reasoning and joint
visual and language understanding. In this work, we seek to advance this line
of research and develop a multimodal framework for the detection of hateful
memes. We improve the performance of existing multimodal approaches beyond
simple fine-tuning and, among others, show the effectiveness of upsampling of
contrastive examples to encourage multimodality and ensemble learning based on
cross-validation to improve robustness. We furthermore analyze model
misclassifications and discuss a number of hypothesis-driven augmentations and
their effects on performance, presenting important implications for future
research in the field. Our best approach comprises an ensemble of UNITER-based
models and achieves an AUROC score of 80.53, placing us 4th on phase 2 of the
2020 Hateful Memes Challenge organized by Facebook.
Related papers
- MHS-STMA: Multimodal Hate Speech Detection via Scalable Transformer-Based Multilevel Attention Framework [15.647035299476894]
This article proposes a scalable architecture for multimodal hate content detection called transformer-based multilevel attention (STMA)
It consists of three main parts: a combined attention-based deep learning mechanism, a vision attention-mechanism encoder, and a caption attention-mechanism encoder.
Several studies employing multiple assessment criteria on three hate speech datasets such as Hateful memes, MultiOff, and MMHS150K, validate the suggested architecture's efficacy.
arXiv Detail & Related papers (2024-09-08T15:42:18Z) - PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis [74.41260927676747]
This paper bridges the gaps by introducing a multimodal conversational Sentiment Analysis (ABSA)
To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements.
To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism.
arXiv Detail & Related papers (2024-08-18T13:51:01Z) - Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition [52.522244807811894]
We propose a novel multimodal Transformer framework using prompt learning to address the issue of missing modalities.
Our method introduces three types of prompts: generative prompts, missing-signal prompts, and missing-type prompts.
Through prompt learning, we achieve a substantial reduction in the number of trainable parameters.
arXiv Detail & Related papers (2024-07-07T13:55:56Z) - Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes
Through Multimodal Explanations [48.82168723932981]
We introduce em MultiBully-Ex, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes.
A Contrastive Language-Image Pretraining (CLIP) approach has been proposed for visual and textual explanation of a meme.
arXiv Detail & Related papers (2024-01-18T11:24:30Z) - DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via
Multi-Modal Causal Attention [55.2825684201129]
DeepSpeed-VisualChat is designed to optimize Large Language Models (LLMs) by incorporating multi-modal capabilities.
Our framework is notable for (1) its open-source support for multi-round and multi-image dialogues, (2) introducing an innovative multi-modal causal attention mechanism, and (3) utilizing data blending techniques on existing datasets to assure seamless interactions.
arXiv Detail & Related papers (2023-09-25T17:53:29Z) - ARC-NLP at Multimodal Hate Speech Event Detection 2023: Multimodal
Methods Boosted by Ensemble Learning, Syntactical and Entity Features [1.3190581566723918]
In the Russia-Ukraine war, both opposing factions heavily relied on text-embedded images as a vehicle for spreading propaganda and hate speech.
In this paper, we outline our methodologies for two subtasks of Multimodal Hate Speech Event Detection 2023.
For the first subtask, hate speech detection, we utilize multimodal deep learning models boosted by ensemble learning and syntactical text attributes.
For the second subtask, target detection, we employ multimodal deep learning models boosted by named entity features.
arXiv Detail & Related papers (2023-07-25T21:56:14Z) - Multimodal and Explainable Internet Meme Classification [3.4690152926833315]
We design and implement a modular and explainable architecture for Internet meme understanding.
We study the relevance of our modular and explainable models in detecting harmful memes on two existing tasks: Hate Speech Detection and Misogyny Classification.
We devise a user-friendly interface that facilitates the comparative analysis of examples retrieved by all of our models for any given meme.
arXiv Detail & Related papers (2022-12-11T21:52:21Z) - Caption Enriched Samples for Improving Hateful Memes Detection [78.5136090997431]
The hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not.
Both unimodal language models and multimodal vision-language models cannot reach the human level of performance.
arXiv Detail & Related papers (2021-09-22T10:57:51Z) - Detecting Hate Speech in Multi-modal Memes [14.036769355498546]
We focus on hate speech detection in multi-modal memes wherein memes pose an interesting multi-modal fusion problem.
We aim to solve the Facebook Meme Challenge citekiela 2020hateful which aims to solve a binary classification problem of predicting whether a meme is hateful or not.
arXiv Detail & Related papers (2020-12-29T18:30:00Z) - The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes [43.778346545763654]
This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes.
It is constructed such that unimodal models struggle and only multimodal models can succeed.
We find that state-of-the-art methods perform poorly compared to humans.
arXiv Detail & Related papers (2020-05-10T21:31:00Z) - Multimodal Categorization of Crisis Events in Social Media [81.07061295887172]
We present a new multimodal fusion method that leverages both images and texts as input.
In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities.
We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.
arXiv Detail & Related papers (2020-04-10T06:31:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.