Classification of Multimodal Hate Speech -- The Winning Solution of
Hateful Memes Challenge
- URL: http://arxiv.org/abs/2012.01002v1
- Date: Wed, 2 Dec 2020 07:38:26 GMT
- Title: Classification of Multimodal Hate Speech -- The Winning Solution of
Hateful Memes Challenge
- Authors: Xiayu Zhong
- Abstract summary: Hateful Memes is a new challenge set for multimodal classification.
Difficult examples are added to the dataset to make it hard to rely on unimodal signals.
I propose a new model that combined multimodal with rules, which achieve the first ranking of accuracy and AUROC of 86.8% and 0.923 respectively.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hateful Memes is a new challenge set for multimodal classification, focusing
on detecting hate speech in multimodal memes. Difficult examples are added to
the dataset to make it hard to rely on unimodal signals, which means only
multimodal models can succeed. According to Kiela,the state-of-the-art methods
perform poorly compared to humans (64.73% vs. 84.7% accuracy) on Hateful Memes.
I propose a new model that combined multimodal with rules, which achieve the
first ranking of accuracy and AUROC of 86.8% and 0.923 respectively. These
rules are extracted from training set, and focus on improving the
classification accuracy of difficult samples.
Related papers
- Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning [53.766434746801366]
Multimodal contrastive learning (MCL) has shown remarkable advances in zero-shot classification by learning from millions of image-caption pairs crawled from the Internet.
Hackers may unauthorizedly exploit image-text data for model training, potentially including personal and privacy-sensitive information.
Recent works propose generating unlearnable examples by adding imperceptible perturbations to training images to build shortcuts for protection.
We propose Multi-step Error Minimization (MEM), a novel optimization process for generating multimodal unlearnable examples.
arXiv Detail & Related papers (2024-07-23T09:00:52Z) - Improving Discriminative Multi-Modal Learning with Large-Scale
Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning.
We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - Hateful Memes Challenge: An Enhanced Multimodal Framework [0.0]
Hateful Meme Challenge proposed by Facebook AI has attracted contestants around the world.
Various state-of-the-art deep learning models have been applied to this problem.
In this paper, we enhance the hateful detection framework, including utilizing Detectron for feature extraction.
arXiv Detail & Related papers (2021-12-20T07:47:17Z) - Caption Enriched Samples for Improving Hateful Memes Detection [78.5136090997431]
The hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not.
Both unimodal language models and multimodal vision-language models cannot reach the human level of performance.
arXiv Detail & Related papers (2021-09-22T10:57:51Z) - Enhance Multimodal Model Performance with Data Augmentation: Facebook
Hateful Meme Challenge Solution [3.8325907381729496]
The Hateful Memes Challenge from Facebook helps fulfill such potential by challenging the contestants to detect hateful speech.
In this paper, we utilize multi-modal, pre-trained models VilBERT and Visual BERT.
Our approach achieved 0.7439 AUROC along with an accuracy of 0.7037 on the challenge's test set, which revealed remarkable progress.
arXiv Detail & Related papers (2021-05-25T01:07:09Z) - Detecting Hate Speech in Multi-modal Memes [14.036769355498546]
We focus on hate speech detection in multi-modal memes wherein memes pose an interesting multi-modal fusion problem.
We aim to solve the Facebook Meme Challenge citekiela 2020hateful which aims to solve a binary classification problem of predicting whether a meme is hateful or not.
arXiv Detail & Related papers (2020-12-29T18:30:00Z) - Detecting Hate Speech in Memes Using Multimodal Deep Learning
Approaches: Prize-winning solution to Hateful Memes Challenge [0.0]
The Hateful Memes Challenge is a first-of-its-kind competition which focuses on detecting hate speech in multimodal memes.
We utilize VisualBERT -- which meant to be the BERT of vision and language -- that was trained multimodally on images and captions.
Our approach achieves 0.811 AUROC with an accuracy of 0.765 on the challenge test set and placed third out of 3,173 participants in the Hateful Memes Challenge.
arXiv Detail & Related papers (2020-12-23T21:09:52Z) - The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes [43.778346545763654]
This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes.
It is constructed such that unimodal models struggle and only multimodal models can succeed.
We find that state-of-the-art methods perform poorly compared to humans.
arXiv Detail & Related papers (2020-05-10T21:31:00Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.