Related papers: Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language

Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language

URL: http://arxiv.org/abs/2103.10195v1
Date: Thu, 18 Mar 2021 12:01:13 GMT
Title: Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language
Authors: Hala Mulki, Bilal Ghanem
Abstract summary: We introduce an Arabic Levantine Twitter dataset for Misogynistic language (LeT-Mi) to be the first benchmark dataset for Arabic misogyny. Let-Mi was used as an evaluation dataset through binary/multi-/target classification tasks conducted by several state-of-the-art machine learning systems.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Online misogyny has become an increasing worry for Arab women who experience gender-based online abuse on a daily basis. Misogyny automatic detection systems can assist in the prohibition of anti-women Arabic toxic content. Developing such systems is hindered by the lack of the Arabic misogyny benchmark datasets. In this paper, we introduce an Arabic Levantine Twitter dataset for Misogynistic language (LeT-Mi) to be the first benchmark dataset for Arabic misogyny. We further provide a detailed review of the dataset creation and annotation phases. The consistency of the annotations for the proposed dataset was emphasized through inter-rater agreement evaluation measures. Moreover, Let-Mi was used as an evaluation dataset through binary/multi-/target classification tasks conducted by several state-of-the-art machine learning systems along with Multi-Task Learning (MTL) configuration. The obtained results indicated that the performances achieved by the used systems are consistent with state-of-the-art results for languages other than Arabic, while employing MTL improved the performance of the misogyny/target classification tasks.

Related papers

FairTranslate: An English-French Dataset for Gender Bias Evaluation in Machine Translation by Overcoming Gender Binarity [0.6827423171182154]
Large Language Models (LLMs) are increasingly leveraged for translation tasks but often fall short when translating inclusive language. This paper presents a novel, fully human-annotated dataset designed to evaluate non-binary gender biases in machine translation systems from English to French.
arXiv Detail & Related papers (2025-04-22T14:35:16Z)
BiaSWE: An Expert Annotated Dataset for Misogyny Detection in Swedish [0.0]
BiaSWE is an expert-annotated dataset tailored for misogyny detection in the Swedish language. Our interdisciplinary team developed a rigorous annotation process, incorporating both domain knowledge and language expertise. The dataset, along with the annotation guidelines, is publicly available for further research.
arXiv Detail & Related papers (2025-02-11T15:25:10Z)
The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification [57.06913662622832]
Gender-fair language fosters inclusion by addressing all genders or using neutral forms. Gender-fair language substantially impacts predictions by flipping labels, reducing certainty, and altering attention patterns. While we offer initial insights on the effect on German text classification, the findings likely apply to other languages.
arXiv Detail & Related papers (2024-09-26T15:08:17Z)
Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders. This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words) We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z)
Ensemble of pre-trained language models and data augmentation for hate speech detection from Arabic tweets [0.27309692684728604]
We propose a novel approach that leverages ensemble learning and semi-supervised learning based on previously manually labeled. We conducted experiments on a benchmark dataset by classifying Arabic tweets into 5 distinct classes: non-hate, general hate, racial, religious, or sexism.
arXiv Detail & Related papers (2024-07-02T17:26:26Z)
A multitask learning framework for leveraging subjectivity of annotators to identify misogyny [47.175010006458436]
We propose a multitask learning approach to enhance the performance of the misogyny identification systems. We incorporated diverse perspectives from annotators in our model design, considering gender and age across six profile groups. This research advances content moderation and highlights the importance of embracing diverse perspectives to build effective online moderation systems.
arXiv Detail & Related papers (2024-06-22T15:06:08Z)
Breaking the Silence Detecting and Mitigating Gendered Abuse in Hindi, Tamil, and Indian English Online Spaces [0.6543929004971272]
Team CNLP-NITS-PP developed an ensemble approach combining CNN and BiLSTM networks. CNN captures localized features indicative of abusive language through its convolution filters applied on embedded input text. BiLSTM analyzes this sequence for dependencies among words and phrases. validation scores showed strong performance across f1-measures, especially for English 0.84.
arXiv Detail & Related papers (2024-04-02T14:55:47Z)
Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets [92.38654521870444]
We introduce ACES, a contrastive challenge set spanning 146 language pairs. This dataset aims to discover whether metrics can identify 68 translation accuracy errors. We conduct a large-scale study by benchmarking ACES on 50 metrics submitted to the WMT 2022 and 2023 metrics shared tasks.
arXiv Detail & Related papers (2024-01-29T17:17:42Z)
Subtle Misogyny Detection and Mitigation: An Expert-Annotated Dataset [5.528106559459623]
The Biasly dataset is built in collaboration with multi-disciplinary experts and annotators themselves. The dataset can be used for a range of NLP tasks, including classification, severity score regression, and text generation for rewrites.
arXiv Detail & Related papers (2023-11-15T23:27:19Z)
AceGPT, Localizing Large Language Models in Arabic [73.39989503874634]
The paper proposes a comprehensive solution that includes pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic. The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities.
arXiv Detail & Related papers (2023-09-21T13:20:13Z)
Deep Multi-Task Models for Misogyny Identification and Categorization on Arabic Social Media [6.6410040715586005]
In this paper, we present the submitted systems to the first Arabic Misogyny Identification shared task. We investigate three multi-task learning models as well as their single-task counterparts. In order to encode the input text, our models rely on the pre-trained MARBERT language model.
arXiv Detail & Related papers (2022-06-16T18:54:37Z)
Fine-Tuning Approach for Arabic Offensive Language Detection System: BERT-Based Model [0.0]
This study investigates the effects of fine-tuning across several Arabic offensive language datasets. We develop multiple classifiers that use four datasets individually and in combination to gain knowledge about online Arabic offensive content.
arXiv Detail & Related papers (2022-02-07T17:26:35Z)
Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language. We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.