UPB at SemEval-2022 Task 5: Enhancing UNITER with Image Sentiment and
Graph Convolutional Networks for Multimedia Automatic Misogyny Identification
- URL: http://arxiv.org/abs/2205.14769v1
- Date: Sun, 29 May 2022 21:12:36 GMT
- Title: UPB at SemEval-2022 Task 5: Enhancing UNITER with Image Sentiment and
Graph Convolutional Networks for Multimedia Automatic Misogyny Identification
- Authors: Andrei Paraschiv, Mihai Dascalu, Dumitru-Clementin Cercel
- Abstract summary: We describe our classification systems submitted to the SemEval-2022 Task 5: MAMI - Multimedia Automatic Misogyny Identification.
Our best model reaches an F1-score of 71.4% in Sub-task A and 67.3% for Sub-task B positioning our team in the upper third of the leaderboard.
- Score: 0.3437656066916039
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent times, the detection of hate-speech, offensive, or abusive language
in online media has become an important topic in NLP research due to the
exponential growth of social media and the propagation of such messages, as
well as their impact. Misogyny detection, even though it plays an important
part in hate-speech detection, has not received the same attention. In this
paper, we describe our classification systems submitted to the SemEval-2022
Task 5: MAMI - Multimedia Automatic Misogyny Identification. The shared task
aimed to identify misogynous content in a multi-modal setting by analysing meme
images together with their textual captions. To this end, we propose two models
based on the pre-trained UNITER model, one enhanced with an image sentiment
classifier, whereas the second leverages a Vocabulary Graph Convolutional
Network (VGCN). Additionally, we explore an ensemble using the aforementioned
models. Our best model reaches an F1-score of 71.4% in Sub-task A and 67.3% for
Sub-task B positioning our team in the upper third of the leaderboard. We
release the code and experiments for our models on GitHub
Related papers
- M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection [69.41274756177336]
Large Language Models (LLMs) have brought an unprecedented surge in machine-generated text (MGT) across diverse channels.
This raises legitimate concerns about its potential misuse and societal implications.
We introduce a new benchmark based on a multilingual, multi-domain, and multi-generator corpus of MGTs -- M4GT-Bench.
arXiv Detail & Related papers (2024-02-17T02:50:33Z) - Lexical Squad@Multimodal Hate Speech Event Detection 2023: Multimodal
Hate Speech Detection using Fused Ensemble Approach [0.23020018305241333]
We present our novel ensemble learning approach for detecting hate speech, by classifying text-embedded images into two labels, namely "Hate Speech" and "No Hate Speech"
Our proposed ensemble model yielded promising results with 75.21 and 74.96 as accuracy and F-1 score (respectively)
arXiv Detail & Related papers (2023-09-23T12:06:05Z) - Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z) - Deep Multi-Task Models for Misogyny Identification and Categorization on
Arabic Social Media [6.6410040715586005]
In this paper, we present the submitted systems to the first Arabic Misogyny Identification shared task.
We investigate three multi-task learning models as well as their single-task counterparts.
In order to encode the input text, our models rely on the pre-trained MARBERT language model.
arXiv Detail & Related papers (2022-06-16T18:54:37Z) - On Advances in Text Generation from Images Beyond Captioning: A Case
Study in Self-Rationalization [89.94078728495423]
We show that recent advances in each modality, CLIP image representations and scaling of language models, do not consistently improve multimodal self-rationalization of tasks with multimodal inputs.
Our findings call for a backbone modelling approach that can be built on to advance text generation from images and text beyond image captioning.
arXiv Detail & Related papers (2022-05-24T00:52:40Z) - TIB-VA at SemEval-2022 Task 5: A Multimodal Architecture for the
Detection and Classification of Misogynous Memes [9.66022279280394]
We present a multimodal architecture that combines textual and visual features in order to detect misogynous meme content.
Our solution obtained the best result in the Task-B where the challenge is to classify whether a given document is misogynous.
arXiv Detail & Related papers (2022-04-13T11:03:21Z) - RubCSG at SemEval-2022 Task 5: Ensemble learning for identifying
misogynous MEMEs [12.979213013465882]
This work presents an ensemble system based on various uni-modal and bi-modal model architectures developed for the SemEval 2022 Task 5: MAMI-Multimedia Automatic Misogyny Identification.
arXiv Detail & Related papers (2022-04-08T09:27:28Z) - AMS_ADRN at SemEval-2022 Task 5: A Suitable Image-text Multimodal Joint
Modeling Method for Multi-task Misogyny Identification [3.5382535469099436]
Women are influential online, especially in image-based social media such as Twitter and Instagram.
In this paper, we describe the system developed by our team for SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification.
arXiv Detail & Related papers (2022-02-18T09:41:37Z) - Video Understanding as Machine Translation [53.59298393079866]
We tackle a wide variety of downstream video understanding tasks by means of a single unified framework.
We report performance gains over the state-of-the-art on several downstream tasks including video classification (EPIC-Kitchens), question answering (TVQA), captioning (TVC, YouCook2, and MSR-VTT)
arXiv Detail & Related papers (2020-06-12T14:07:04Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z) - EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege's
Principle [71.47160118286226]
We present EmotiCon, a learning-based algorithm for context-aware perceived human emotion recognition from videos and images.
Motivated by Frege's Context Principle from psychology, our approach combines three interpretations of context for emotion recognition.
We report an Average Precision (AP) score of 35.48 across 26 classes, which is an improvement of 7-8 over prior methods.
arXiv Detail & Related papers (2020-03-14T19:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.