Hateful Memes Challenge: An Enhanced Multimodal Framework
- URL: http://arxiv.org/abs/2112.11244v1
- Date: Mon, 20 Dec 2021 07:47:17 GMT
- Title: Hateful Memes Challenge: An Enhanced Multimodal Framework
- Authors: Aijing Gao, Bingjun Wang, Jiaqi Yin, Yating Tian
- Abstract summary: Hateful Meme Challenge proposed by Facebook AI has attracted contestants around the world.
Various state-of-the-art deep learning models have been applied to this problem.
In this paper, we enhance the hateful detection framework, including utilizing Detectron for feature extraction.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hateful Meme Challenge proposed by Facebook AI has attracted contestants
around the world. The challenge focuses on detecting hateful speech in
multimodal memes. Various state-of-the-art deep learning models have been
applied to this problem and the performance on challenge's leaderboard has also
been constantly improved. In this paper, we enhance the hateful detection
framework, including utilizing Detectron for feature extraction, exploring
different setups of VisualBERT and UNITER models with different loss functions,
researching the association between the hateful memes and the sensitive text
features, and finally building ensemble method to boost model performance. The
AUROC of our fine-tuned VisualBERT, UNITER, and ensemble method achieves 0.765,
0.790, and 0.803 on the challenge's test set, respectively, which beats the
baseline models. Our code is available at
https://github.com/yatingtian/hateful-meme
Related papers
- ReconBoost: Boosting Can Achieve Modality Reconcilement [89.4377895465204]
We study the modality-alternating learning paradigm to achieve reconcilement.
We propose a new method called ReconBoost to update a fixed modality each time.
We show that the proposed method resembles Friedman's Gradient-Boosting (GB) algorithm, where the updated learner can correct errors made by others.
arXiv Detail & Related papers (2024-05-15T13:22:39Z) - A Study of Dropout-Induced Modality Bias on Robustness to Missing Video
Frames for Audio-Visual Speech Recognition [53.800937914403654]
Advanced Audio-Visual Speech Recognition (AVSR) systems have been observed to be sensitive to missing video frames.
While applying the dropout technique to the video modality enhances robustness to missing frames, it simultaneously results in a performance loss when dealing with complete data input.
We propose a novel Multimodal Distribution Approximation with Knowledge Distillation (MDA-KD) framework to reduce over-reliance on the audio modality.
arXiv Detail & Related papers (2024-03-07T06:06:55Z) - Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion [54.33764537135906]
VideoQA Transformer models demonstrate competitive performance on standard benchmarks.
Do these models capture the rich multimodal structures and dynamics from video and text jointly?
Are they achieving high scores by exploiting biases and spurious features?
arXiv Detail & Related papers (2023-06-15T06:45:46Z) - Clover: Towards A Unified Video-Language Alignment and Fusion Model [154.1070559563592]
We introduce Clover, a Correlated Video-Language pre-training method.
It improves cross-modal feature alignment and fusion via a novel tri-modal alignment pre-training task.
Clover establishes new state-of-the-arts on multiple downstream tasks.
arXiv Detail & Related papers (2022-07-16T09:38:52Z) - Caption Enriched Samples for Improving Hateful Memes Detection [78.5136090997431]
The hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not.
Both unimodal language models and multimodal vision-language models cannot reach the human level of performance.
arXiv Detail & Related papers (2021-09-22T10:57:51Z) - Enhance Multimodal Model Performance with Data Augmentation: Facebook
Hateful Meme Challenge Solution [3.8325907381729496]
The Hateful Memes Challenge from Facebook helps fulfill such potential by challenging the contestants to detect hateful speech.
In this paper, we utilize multi-modal, pre-trained models VilBERT and Visual BERT.
Our approach achieved 0.7439 AUROC along with an accuracy of 0.7037 on the challenge's test set, which revealed remarkable progress.
arXiv Detail & Related papers (2021-05-25T01:07:09Z) - Detecting Hate Speech in Multi-modal Memes [14.036769355498546]
We focus on hate speech detection in multi-modal memes wherein memes pose an interesting multi-modal fusion problem.
We aim to solve the Facebook Meme Challenge citekiela 2020hateful which aims to solve a binary classification problem of predicting whether a meme is hateful or not.
arXiv Detail & Related papers (2020-12-29T18:30:00Z) - Detecting Hate Speech in Memes Using Multimodal Deep Learning
Approaches: Prize-winning solution to Hateful Memes Challenge [0.0]
The Hateful Memes Challenge is a first-of-its-kind competition which focuses on detecting hate speech in multimodal memes.
We utilize VisualBERT -- which meant to be the BERT of vision and language -- that was trained multimodally on images and captions.
Our approach achieves 0.811 AUROC with an accuracy of 0.765 on the challenge test set and placed third out of 3,173 participants in the Hateful Memes Challenge.
arXiv Detail & Related papers (2020-12-23T21:09:52Z) - A Multimodal Framework for the Detection of Hateful Memes [16.7604156703965]
We aim to develop a framework for the detection of hateful memes.
We show the effectiveness of upsampling of contrastive examples to encourage multimodality and ensemble learning.
Our best approach comprises an ensemble of UNITER-based models and achieves an AUROC score of 80.53, placing us 4th on phase 2 of the 2020 Hateful Memes Challenge organized by Facebook.
arXiv Detail & Related papers (2020-12-23T18:37:11Z) - The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes [43.778346545763654]
This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes.
It is constructed such that unimodal models struggle and only multimodal models can succeed.
We find that state-of-the-art methods perform poorly compared to humans.
arXiv Detail & Related papers (2020-05-10T21:31:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.