Related papers: What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study

What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study

URL: http://arxiv.org/abs/2410.00545v2
Date: Mon, 7 Oct 2024 08:52:39 GMT
Title: What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study
Authors: Beatrice Savoldi, Sara Papi, Matteo Negri, Ana Guerberof, Luisa Bentivogli,
Abstract summary: Gender bias in machine translation (MT) is recognized as an issue that can harm people and society. We conduct an extensive human-centered study to examine if and to what extent bias in MT brings harms with tangible costs.
Score: 18.464888281674806
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Gender bias in machine translation (MT) is recognized as an issue that can harm people and society. And yet, advancements in the field rarely involve people, the final MT users, or inform how they might be impacted by biased technologies. Current evaluations are often restricted to automatic methods, which offer an opaque estimate of what the downstream impact of gender disparities might be. We conduct an extensive human-centered study to examine if and to what extent bias in MT brings harms with tangible costs, such as quality of service gaps across women and men. To this aim, we collect behavioral data from 90 participants, who post-edited MT outputs to ensure correct gender translation. Across multiple datasets, languages, and types of users, our study shows that feminine post-editing demands significantly more technical and temporal effort, also corresponding to higher financial costs. Existing bias measurements, however, fail to reflect the found disparities. Our findings advocate for human-centered approaches that can inform the societal impact of bias.

Related papers

The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [58.130894823145205]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias. Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning. We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z)
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) [82.57490175399693]
We study gender bias in 22 popular image-to-text vision-language assistants (VLAs) Our results show that VLAs replicate human biases likely present in the data, such as real-world occupational imbalances. To eliminate the gender bias in these models, we find that finetuning-based debiasing methods achieve the best tradeoff between debiasing and retaining performance on downstream tasks.
arXiv Detail & Related papers (2024-10-25T05:59:44Z)
Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation [28.01631390361754]
Masculine-inflected translations score higher than feminine-inflected ones, and gender-neutral translations are penalized. context-aware QE metrics reduce errors for masculine-inflected references but fail to address feminine referents. Our findings underscore the need to address gender bias in QE metrics to ensure equitable and unbiased machine translation systems.
arXiv Detail & Related papers (2024-10-14T18:24:52Z)
GOSt-MT: A Knowledge Graph for Occupation-related Gender Biases in Machine Translation [2.3154290513589784]
Gender bias in machine translation (MT) systems poses significant challenges that often result in the reinforcement of harmful stereotypes. This paper introduces a novel approach to studying occupation-related gender bias through the creation of the GOSt-MT Knowledge Graph.
arXiv Detail & Related papers (2024-09-17T08:44:20Z)
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases. GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z)
Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders. This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words) We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z)
A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for Fairer Instruction-Tuned Machine Translation [35.44115368160656]
We investigate whether and to what extent machine translation models exhibit gender bias. We find that IFT models default to male-inflected translations, even disregarding female occupational stereotypes. We propose an easy-to-implement and effective bias mitigation solution.
arXiv Detail & Related papers (2023-10-18T17:36:55Z)
The Impact of Debiasing on the Performance of Language Models in Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets. Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z)
Gender Bias in Transformer Models: A comprehensive survey [1.1011268090482573]
Gender bias in artificial intelligence (AI) has emerged as a pressing concern with profound implications for individuals' lives. This paper presents a comprehensive survey that explores gender bias in Transformer models from a linguistic perspective.
arXiv Detail & Related papers (2023-06-18T11:40:47Z)
Towards Understanding Gender-Seniority Compound Bias in Natural Language Generation [64.65911758042914]
We investigate how seniority impacts the degree of gender bias exhibited in pretrained neural generation models. Our results show that GPT-2 amplifies bias by considering women as junior and men as senior more often than the ground truth in both domains. These results suggest that NLP applications built using GPT-2 may harm women in professional capacities.
arXiv Detail & Related papers (2022-05-19T20:05:02Z)
Examining Covert Gender Bias: A Case Study in Turkish and English Machine Translation Models [7.648784748888186]
We examine cases of both overt and covert gender bias in Machine Translation models. Specifically, we introduce a method to investigate asymmetrical gender markings. We also assess bias in the attribution of personhood and examine occupational and personality stereotypes.
arXiv Detail & Related papers (2021-08-23T19:25:56Z)
Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases. We propose steps towards mitigating social biases during text generation. Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
Evaluating Gender Bias in Natural Language Inference [5.034017602990175]
We propose an evaluation methodology to measure gender bias in natural language understanding through inference. We use our challenge task to investigate state-of-the-art NLI models on the presence of gender stereotypes using occupations. Our findings suggest that three models trained on MNLI and SNLI datasets are significantly prone to gender-induced prediction errors.
arXiv Detail & Related papers (2021-05-12T09:41:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.