The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses
- URL: http://arxiv.org/abs/2110.09216v1
- Date: Mon, 18 Oct 2021 12:06:17 GMT
- Title: The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses
- Authors: Bashar Alhafni, Nizar Habash, Houda Bouamor
- Abstract summary: We introduce a new corpus for gender identification and rewriting in contexts involving one or two target users.
We focus on Arabic, a gender-marking morphologically rich language.
- Score: 17.253633576291897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gender bias in natural language processing (NLP) applications, particularly
machine translation, has been receiving increasing attention. Much of the
research on this issue has focused on mitigating gender bias in English NLP
models and systems. Addressing the problem in poorly resourced, and/or
morphologically rich languages has lagged behind, largely due to the lack of
datasets and resources. In this paper, we introduce a new corpus for gender
identification and rewriting in contexts involving one or two target users (I
and/or You) -- first and second grammatical persons with independent
grammatical gender preferences. We focus on Arabic, a gender-marking
morphologically rich language. The corpus has multiple parallel components:
four combinations of 1st and 2nd person in feminine and masculine grammatical
genders, as well as English, and English to Arabic machine translation output.
This corpus expands on Habash et al. (2019)'s Arabic Parallel Gender Corpus
(APGC v1.0) by adding second person targets as well as increasing the total
number of sentences over 6.5 times, reaching over 590K words. Our new dataset
will aid the research and development of gender identification, controlled text
generation, and post-editing rewrite systems that could be used to personalize
NLP applications and provide users with the correct outputs based on their
grammatical gender preferences. We make the Arabic Parallel Gender Corpus (APGC
v2.0) publicly available.
Related papers
- Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - Building Bridges: A Dataset for Evaluating Gender-Fair Machine Translation into German [17.924716793621627]
We study gender-fair language in English-to-German machine translation (MT)
We conduct the first benchmark study involving two commercial systems and six neural MT models.
Our findings show that most systems produce mainly masculine forms and rarely gender-neutral variants.
arXiv Detail & Related papers (2024-06-10T09:39:19Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - How To Build Competitive Multi-gender Speech Translation Models For
Controlling Speaker Gender Translation [21.125217707038356]
When translating from notional gender languages into grammatical gender languages, the generated translation requires explicit gender assignments for various words, including those referring to the speaker.
To avoid such biased and not inclusive behaviors, the gender assignment of speaker-related expressions should be guided by externally-provided metadata about the speaker's gender.
This paper aims to achieve the same results by integrating the speaker's gender metadata into a single "multi-gender" neural ST model, easier to maintain.
arXiv Detail & Related papers (2023-10-23T17:21:32Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - User-Centric Gender Rewriting [12.519348416773553]
We define the task of gender rewriting in contexts involving two users (I and/or You)
We develop a multi-step system that combines the positive aspects of both rule-based and neural rewriting models.
Our results successfully demonstrate the viability of this approach on a recently created corpus for Arabic gender rewriting.
arXiv Detail & Related papers (2022-05-04T17:46:17Z) - Generating Gender Augmented Data for NLP [3.5557219875516655]
Gender bias is a frequent occurrence in NLP-based applications, especially in gender-inflected languages.
This paper proposes an automatic and generalisable rewriting approach for short conversational sentences.
The proposed approach is based on a neural machine translation (NMT) system trained to 'translate' from one gender alternative to another.
arXiv Detail & Related papers (2021-07-13T11:13:21Z) - Quantifying Gender Bias Towards Politicians in Cross-Lingual Language
Models [104.41668491794974]
We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender.
We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians.
arXiv Detail & Related papers (2021-04-15T15:03:26Z) - They, Them, Theirs: Rewriting with Gender-Neutral English [56.14842450974887]
We perform a case study on the singular they, a common way to promote gender inclusion in English.
We show how a model can be trained to produce gender-neutral English with 1% word error rate with no human-labeled data.
arXiv Detail & Related papers (2021-02-12T21:47:48Z) - Gender in Danger? Evaluating Speech Translation Technology on the
MuST-SHE Corpus [20.766890957411132]
Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines.
Can audio provide additional information to reduce gender bias?
We present the first thorough investigation of gender bias in speech translation, contributing with the release of a benchmark useful for future studies.
arXiv Detail & Related papers (2020-06-10T09:55:38Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.