Deep Multi-Task Models for Misogyny Identification and Categorization on
Arabic Social Media
- URL: http://arxiv.org/abs/2206.08407v1
- Date: Thu, 16 Jun 2022 18:54:37 GMT
- Title: Deep Multi-Task Models for Misogyny Identification and Categorization on
Arabic Social Media
- Authors: Abdelkader El Mahdaouy, Abdellah El Mekki, Ahmed Oumar, Hajar
Mousannif, Ismail Berrada
- Abstract summary: In this paper, we present the submitted systems to the first Arabic Misogyny Identification shared task.
We investigate three multi-task learning models as well as their single-task counterparts.
In order to encode the input text, our models rely on the pre-trained MARBERT language model.
- Score: 6.6410040715586005
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The prevalence of toxic content on social media platforms, such as hate
speech, offensive language, and misogyny, presents serious challenges to our
interconnected society. These challenging issues have attracted widespread
attention in Natural Language Processing (NLP) community. In this paper, we
present the submitted systems to the first Arabic Misogyny Identification
shared task. We investigate three multi-task learning models as well as their
single-task counterparts. In order to encode the input text, our models rely on
the pre-trained MARBERT language model. The overall obtained results show that
all our submitted models have achieved the best performances (top three ranked
submissions) in both misogyny identification and categorization tasks.
Related papers
- Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - A multitask learning framework for leveraging subjectivity of annotators to identify misogyny [47.175010006458436]
We propose a multitask learning approach to enhance the performance of the misogyny identification systems.
We incorporated diverse perspectives from annotators in our model design, considering gender and age across six profile groups.
This research advances content moderation and highlights the importance of embracing diverse perspectives to build effective online moderation systems.
arXiv Detail & Related papers (2024-06-22T15:06:08Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - AI-UPV at EXIST 2023 -- Sexism Characterization Using Large Language
Models Under The Learning with Disagreements Regime [2.4261434441245897]
This paper describes AI-UPV team's participation in the EXIST (sEXism Identification in Social neTworks) Lab at CLEF 2023.
The proposed approach aims at addressing the task of sexism identification and characterization under the learning with disagreements paradigm.
The proposed system uses large language models (i.e., mBERT and XLM-RoBERTa) and ensemble strategies for sexism identification and classification in English and Spanish.
arXiv Detail & Related papers (2023-07-07T04:49:26Z) - "I'm fully who I am": Towards Centering Transgender and Non-Binary
Voices to Measure Biases in Open Language Generation [69.25368160338043]
Transgender and non-binary (TGNB) individuals disproportionately experience discrimination and exclusion from daily life.
We assess how the social reality surrounding experienced marginalization of TGNB persons contributes to and persists within Open Language Generation.
We introduce TANGO, a dataset of template-based real-world text curated from a TGNB-oriented community.
arXiv Detail & Related papers (2023-05-17T04:21:45Z) - Solving Quantitative Reasoning Problems with Language Models [53.53969870599973]
We introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content.
The model achieves state-of-the-art performance on technical benchmarks without the use of external tools.
We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences.
arXiv Detail & Related papers (2022-06-29T18:54:49Z) - UPB at SemEval-2022 Task 5: Enhancing UNITER with Image Sentiment and
Graph Convolutional Networks for Multimedia Automatic Misogyny Identification [0.3437656066916039]
We describe our classification systems submitted to the SemEval-2022 Task 5: MAMI - Multimedia Automatic Misogyny Identification.
Our best model reaches an F1-score of 71.4% in Sub-task A and 67.3% for Sub-task B positioning our team in the upper third of the leaderboard.
arXiv Detail & Related papers (2022-05-29T21:12:36Z) - TIB-VA at SemEval-2022 Task 5: A Multimodal Architecture for the
Detection and Classification of Misogynous Memes [9.66022279280394]
We present a multimodal architecture that combines textual and visual features in order to detect misogynous meme content.
Our solution obtained the best result in the Task-B where the challenge is to classify whether a given document is misogynous.
arXiv Detail & Related papers (2022-04-13T11:03:21Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language [0.0]
We introduce an Arabic Levantine Twitter dataset for Misogynistic language (LeT-Mi) to be the first benchmark dataset for Arabic misogyny.
Let-Mi was used as an evaluation dataset through binary/multi-/target classification tasks conducted by several state-of-the-art machine learning systems.
arXiv Detail & Related papers (2021-03-18T12:01:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.