Multilingual Holistic Bias: Extending Descriptors and Patterns to Unveil
Demographic Biases in Languages at Scale
- URL: http://arxiv.org/abs/2305.13198v1
- Date: Mon, 22 May 2023 16:29:04 GMT
- Title: Multilingual Holistic Bias: Extending Descriptors and Patterns to Unveil
Demographic Biases in Languages at Scale
- Authors: Marta R. Costa-juss\`a, Pierre Andrews, Eric Smith, Prangthip
Hansanti, Christophe Ropers, Elahe Kalbassi, Cynthia Gao, Daniel Licht,
Carleigh Wood
- Abstract summary: This extension consists of 20,459 sentences in 50 languages distributed across all 13 demographic axes.
Our benchmark is intended to uncover demographic imbalances and be the tool to quantify mitigations towards them.
- Score: 0.21079694661943604
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We introduce a multilingual extension of the HOLISTICBIAS dataset, the
largest English template-based taxonomy of textual people references:
MULTILINGUALHOLISTICBIAS. This extension consists of 20,459 sentences in 50
languages distributed across all 13 demographic axes. Source sentences are
built from combinations of 118 demographic descriptors and three patterns,
excluding nonsensical combinations. Multilingual translations include
alternatives for gendered languages that cover gendered translations when there
is ambiguity in English. Our benchmark is intended to uncover demographic
imbalances and be the tool to quantify mitigations towards them.
Our initial findings show that translation quality for EN-to-XX translations
is an average of 8 spBLEU better when evaluating with the masculine human
reference compared to feminine. In the opposite direction, XX-to-EN, we compare
the robustness of the model when the source input only differs in gender
(masculine or feminine) and masculine translations are an average of almost 4
spBLEU better than feminine. When embedding sentences to a joint multilingual
sentence representations space, we find that for most languages masculine
translations are significantly closer to the English neutral sentences when
embedded.
Related papers
- The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification [57.06913662622832]
Gender-fair language fosters inclusion by addressing all genders or using neutral forms.
Gender-fair language substantially impacts predictions by flipping labels, reducing certainty, and altering attention patterns.
While we offer initial insights on the effect on German text classification, the findings likely apply to other languages.
arXiv Detail & Related papers (2024-09-26T15:08:17Z) - Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - Towards Massive Multilingual Holistic Bias [9.44611286329108]
We present the initial eight languages from the MASSIVE MULTILINGUAL HOLISTICBIAS dataset.
We propose an automatic construction methodology to further scale up MMHB sentences in terms of both language coverage and size.
arXiv Detail & Related papers (2024-06-29T16:26:27Z) - GATE X-E : A Challenge Set for Gender-Fair Translations from
Weakly-Gendered Languages [0.0]
We introduce GATE X-E, an extension to the GATE corpus, that consists of human translations from Turkish, Hungarian, Finnish, and Persian into English.
The dataset features natural sentences with a wide range of sentence lengths and domains, challenging translation rewriters on various linguistic phenomena.
We present a translation gender rewriting solution built with GPT-4 and use GATE X-E to evaluate it.
arXiv Detail & Related papers (2024-02-22T04:36:14Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Evaluating Gender Bias in the Translation of Gender-Neutral Languages
into English [0.0]
We introduce GATE X-E, an extension to the GATE corpus, that consists of human translations from Turkish, Hungarian, Finnish, and Persian into English.
The dataset features natural sentences with a wide range of sentence lengths and domains, challenging translation rewriters on various linguistic phenomena.
We present an English gender rewriting solution built on GPT-3.5 Turbo and use GATE X-E to evaluate it.
arXiv Detail & Related papers (2023-11-15T10:25:14Z) - Gender Lost In Translation: How Bridging The Gap Between Languages
Affects Gender Bias in Zero-Shot Multilingual Translation [12.376309678270275]
bridging the gap between languages for which parallel data is not available affects gender bias in multilingual NMT.
We study the effect of encouraging language-agnostic hidden representations on models' ability to preserve gender.
We find that language-agnostic representations mitigate zero-shot models' masculine bias, and with increased levels of gender inflection in the bridge language, pivoting surpasses zero-shot translation regarding fairer gender preservation for speaker-related gender agreement.
arXiv Detail & Related papers (2023-05-26T13:51:50Z) - Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in
Multilingual Machine Translation [28.471506840241602]
Gender bias is a significant issue in machine translation, leading to ongoing research efforts in developing bias mitigation techniques.
We propose a bias mitigation method based on a novel approach.
Gender-Aware Contrastive Learning, GACL, encodes contextual gender information into the representations of non-explicit gender words.
arXiv Detail & Related papers (2023-05-23T12:53:39Z) - Analyzing Gender Representation in Multilingual Models [59.21915055702203]
We focus on the representation of gender distinctions as a practical case study.
We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
arXiv Detail & Related papers (2022-04-20T00:13:01Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.