The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender
Characterisation in 55 Languages
- URL: http://arxiv.org/abs/2308.16871v1
- Date: Thu, 31 Aug 2023 17:20:50 GMT
- Title: The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender
Characterisation in 55 Languages
- Authors: Benjamin Muller, Belen Alastruey, Prangthip Hansanti, Elahe Kalbassi,
Christophe Ropers, Eric Michael Smith, Adina Williams, Luke Zettlemoyer,
Pierre Andrews and Marta R. Costa-juss\`a
- Abstract summary: This paper describes the Gender-GAP Pipeline, an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages.
The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text.
We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.
- Score: 51.2321117760104
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Gender biases in language generation systems are challenging to mitigate. One
possible source for these biases is gender representation disparities in the
training and evaluation data. Despite recent progress in documenting this
problem and many attempts at mitigating it, we still lack shared methodology
and tooling to report gender representation in large datasets. Such
quantitative reporting will enable further mitigation, e.g., via data
augmentation. This paper describes the Gender-GAP Pipeline (for Gender-Aware
Polyglot Pipeline), an automatic pipeline to characterize gender representation
in large-scale datasets for 55 languages. The pipeline uses a multilingual
lexicon of gendered person-nouns to quantify the gender representation in text.
We showcase it to report gender representation in WMT training data and
development data for the News task, confirming that current data is skewed
towards masculine representation. Having unbalanced datasets may indirectly
optimize our systems towards outperforming one gender over the others. We
suggest introducing our gender quantification pipeline in current datasets and,
ideally, modifying them toward a balanced representation.
Related papers
- Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Gender Bias in Transformer Models: A comprehensive survey [1.1011268090482573]
Gender bias in artificial intelligence (AI) has emerged as a pressing concern with profound implications for individuals' lives.
This paper presents a comprehensive survey that explores gender bias in Transformer models from a linguistic perspective.
arXiv Detail & Related papers (2023-06-18T11:40:47Z) - GATE: A Challenge Set for Gender-Ambiguous Translation Examples [0.31498833540989407]
When source gender is ambiguous, machine translation models typically default to stereotypical gender roles, perpetuating harmful bias.
Recent work has led to the development of "gender rewriters" that generate alternative gender translations on such ambiguous inputs, but such systems are plagued by poor linguistic coverage.
We present and release GATE, a linguistically diverse corpus of gender-ambiguous source sentences along with multiple alternative target language translations.
arXiv Detail & Related papers (2023-03-07T15:23:38Z) - GenderedNews: Une approche computationnelle des \'ecarts de
repr\'esentation des genres dans la presse fran\c{c}aise [0.0]
We present it GenderedNews (urlhttps://gendered-news.imag.fr), an online dashboard which gives weekly measures of gender imbalance in French online press.
We use Natural Language Processing (NLP) methods to quantify gender inequalities in the media.
We describe the data collected daily (seven main titles of French online news media) and the methodology behind our metrics.
arXiv Detail & Related papers (2022-02-11T15:16:49Z) - Gendered Language in Resumes and its Implications for Algorithmic Bias
in Hiring [0.0]
We train a series of models to classify the gender of the applicant.
We investigate whether it is possible to obfuscate gender from resumes.
We find that there is a significant amount of gendered information in resumes even after obfuscation.
arXiv Detail & Related papers (2021-12-16T14:26:36Z) - Improving Gender Fairness of Pre-Trained Language Models without
Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin.
We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z) - Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women.
We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z) - Reducing Gender Bias in Neural Machine Translation as a Domain
Adaptation Problem [21.44025591721678]
Training data for NLP tasks often exhibits gender bias in that fewer sentences refer to women than to men.
Recent WinoMT challenge set allows us to measure this effect directly.
We use transfer learning on a small set of trusted, gender-balanced examples.
arXiv Detail & Related papers (2020-04-09T11:55:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.