Enhance Multimodal Model Performance with Data Augmentation: Facebook
Hateful Meme Challenge Solution
- URL: http://arxiv.org/abs/2105.13132v1
- Date: Tue, 25 May 2021 01:07:09 GMT
- Title: Enhance Multimodal Model Performance with Data Augmentation: Facebook
Hateful Meme Challenge Solution
- Authors: Yang Li, Zinc Zhang, Hutchin Huang
- Abstract summary: The Hateful Memes Challenge from Facebook helps fulfill such potential by challenging the contestants to detect hateful speech.
In this paper, we utilize multi-modal, pre-trained models VilBERT and Visual BERT.
Our approach achieved 0.7439 AUROC along with an accuracy of 0.7037 on the challenge's test set, which revealed remarkable progress.
- Score: 3.8325907381729496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hateful content detection is one of the areas where deep learning can and
should make a significant difference. The Hateful Memes Challenge from Facebook
helps fulfill such potential by challenging the contestants to detect hateful
speech in multi-modal memes using deep learning algorithms. In this paper, we
utilize multi-modal, pre-trained models VilBERT and Visual BERT. We improved
models' performance by adding training datasets generated from data
augmentation. Enlarging the training data set helped us get a more than 2%
boost in terms of AUROC with the Visual BERT model. Our approach achieved
0.7439 AUROC along with an accuracy of 0.7037 on the challenge's test set,
which revealed remarkable progress.
Related papers
- Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods.
MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections.
Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Leveraging Demonstrations to Improve Online Learning: Quality Matters [54.98983862640944]
We show that the degree of improvement must depend on the quality of the demonstration data.
We propose an informed TS algorithm that utilizes the demonstration data in a coherent way through Bayes' rule.
arXiv Detail & Related papers (2023-02-07T08:49:12Z) - Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss.
We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z) - Dynamic Contrastive Distillation for Image-Text Retrieval [90.05345397400144]
We present a novel plug-in dynamic contrastive distillation (DCD) framework to compress image-text retrieval models.
We successfully apply our proposed DCD strategy to two state-of-the-art vision-language pretrained models, i.e. ViLT and METER.
Experiments on MS-COCO and Flickr30K benchmarks show the effectiveness and efficiency of our DCD framework.
arXiv Detail & Related papers (2022-07-04T14:08:59Z) - Hateful Memes Challenge: An Enhanced Multimodal Framework [0.0]
Hateful Meme Challenge proposed by Facebook AI has attracted contestants around the world.
Various state-of-the-art deep learning models have been applied to this problem.
In this paper, we enhance the hateful detection framework, including utilizing Detectron for feature extraction.
arXiv Detail & Related papers (2021-12-20T07:47:17Z) - Classification of Multimodal Hate Speech -- The Winning Solution of
Hateful Memes Challenge [0.0]
Hateful Memes is a new challenge set for multimodal classification.
Difficult examples are added to the dataset to make it hard to rely on unimodal signals.
I propose a new model that combined multimodal with rules, which achieve the first ranking of accuracy and AUROC of 86.8% and 0.923 respectively.
arXiv Detail & Related papers (2020-12-02T07:38:26Z) - Recognizing More Emotions with Less Data Using Self-supervised Transfer
Learning [0.0]
We propose a novel transfer learning method for speech emotion recognition.
With as low as 125 examples per emotion class, we were able to reach a higher accuracy than a strong baseline trained on 8 times more data.
arXiv Detail & Related papers (2020-11-11T06:18:31Z) - The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes [43.778346545763654]
This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes.
It is constructed such that unimodal models struggle and only multimodal models can succeed.
We find that state-of-the-art methods perform poorly compared to humans.
arXiv Detail & Related papers (2020-05-10T21:31:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.