Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language
- URL: http://arxiv.org/abs/2103.10195v1
- Date: Thu, 18 Mar 2021 12:01:13 GMT
- Title: Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language
- Authors: Hala Mulki, Bilal Ghanem
- Abstract summary: We introduce an Arabic Levantine Twitter dataset for Misogynistic language (LeT-Mi) to be the first benchmark dataset for Arabic misogyny.
Let-Mi was used as an evaluation dataset through binary/multi-/target classification tasks conducted by several state-of-the-art machine learning systems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online misogyny has become an increasing worry for Arab women who experience
gender-based online abuse on a daily basis. Misogyny automatic detection
systems can assist in the prohibition of anti-women Arabic toxic content.
Developing such systems is hindered by the lack of the Arabic misogyny
benchmark datasets. In this paper, we introduce an Arabic Levantine Twitter
dataset for Misogynistic language (LeT-Mi) to be the first benchmark dataset
for Arabic misogyny. We further provide a detailed review of the dataset
creation and annotation phases. The consistency of the annotations for the
proposed dataset was emphasized through inter-rater agreement evaluation
measures. Moreover, Let-Mi was used as an evaluation dataset through
binary/multi-/target classification tasks conducted by several state-of-the-art
machine learning systems along with Multi-Task Learning (MTL) configuration.
The obtained results indicated that the performances achieved by the used
systems are consistent with state-of-the-art results for languages other than
Arabic, while employing MTL improved the performance of the misogyny/target
classification tasks.
Related papers
- Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - Ensemble of pre-trained language models and data augmentation for hate speech detection from Arabic tweets [0.27309692684728604]
We propose a novel approach that leverages ensemble learning and semi-supervised learning based on previously manually labeled.
We conducted experiments on a benchmark dataset by classifying Arabic tweets into 5 distinct classes: non-hate, general hate, racial, religious, or sexism.
arXiv Detail & Related papers (2024-07-02T17:26:26Z) - A multitask learning framework for leveraging subjectivity of annotators to identify misogyny [47.175010006458436]
We propose a multitask learning approach to enhance the performance of the misogyny identification systems.
We incorporated diverse perspectives from annotators in our model design, considering gender and age across six profile groups.
This research advances content moderation and highlights the importance of embracing diverse perspectives to build effective online moderation systems.
arXiv Detail & Related papers (2024-06-22T15:06:08Z) - Breaking the Silence Detecting and Mitigating Gendered Abuse in Hindi, Tamil, and Indian English Online Spaces [0.6543929004971272]
Team CNLP-NITS-PP developed an ensemble approach combining CNN and BiLSTM networks.
CNN captures localized features indicative of abusive language through its convolution filters applied on embedded input text.
BiLSTM analyzes this sequence for dependencies among words and phrases.
validation scores showed strong performance across f1-measures, especially for English 0.84.
arXiv Detail & Related papers (2024-04-02T14:55:47Z) - Machine Translation Meta Evaluation through Translation Accuracy
Challenge Sets [92.38654521870444]
We introduce ACES, a contrastive challenge set spanning 146 language pairs.
This dataset aims to discover whether metrics can identify 68 translation accuracy errors.
We conduct a large-scale study by benchmarking ACES on 50 metrics submitted to the WMT 2022 and 2023 metrics shared tasks.
arXiv Detail & Related papers (2024-01-29T17:17:42Z) - MiTTenS: A Dataset for Evaluating Misgendering in Translation [16.446952262028358]
Misgendering is the act of referring to someone in a way that does not reflect their gender identity.
We introduce a dataset, MiTTenS, covering 26 languages from a variety of language families and scripts.
arXiv Detail & Related papers (2024-01-13T00:08:23Z) - Subtle Misogyny Detection and Mitigation: An Expert-Annotated Dataset [5.528106559459623]
The Biasly dataset is built in collaboration with multi-disciplinary experts and annotators themselves.
The dataset can be used for a range of NLP tasks, including classification, severity score regression, and text generation for rewrites.
arXiv Detail & Related papers (2023-11-15T23:27:19Z) - AceGPT, Localizing Large Language Models in Arabic [73.39989503874634]
The paper proposes a comprehensive solution that includes pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic.
The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities.
arXiv Detail & Related papers (2023-09-21T13:20:13Z) - The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - Deep Multi-Task Models for Misogyny Identification and Categorization on
Arabic Social Media [6.6410040715586005]
In this paper, we present the submitted systems to the first Arabic Misogyny Identification shared task.
We investigate three multi-task learning models as well as their single-task counterparts.
In order to encode the input text, our models rely on the pre-trained MARBERT language model.
arXiv Detail & Related papers (2022-06-16T18:54:37Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.