Related papers: Generating Gender Alternatives in Machine Translation

Generating Gender Alternatives in Machine Translation

URL: http://arxiv.org/abs/2407.20438v1
Date: Mon, 29 Jul 2024 22:10:51 GMT
Title: Generating Gender Alternatives in Machine Translation
Authors: Sarthak Garg, Mozhdeh Gheini, Clara Emmanuel, Tatiana Likhomanenko, Qin Gao, Matthias Paulik,
Abstract summary: Machine translation systems often translate terms with ambiguous gender into the gendered form that is most prevalent in the systems' training data. This often reflects and perpetuates harmful stereotypes present in society. We study the problem of generating all grammatically correct gendered translation alternatives.
Score: 13.153018685139413
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine translation (MT) systems often translate terms with ambiguous gender (e.g., English term "the nurse") into the gendered form that is most prevalent in the systems' training data (e.g., "enfermera", the Spanish term for a female nurse). This often reflects and perpetuates harmful stereotypes present in society. With MT user interfaces in mind that allow for resolving gender ambiguity in a frictionless manner, we study the problem of generating all grammatically correct gendered translation alternatives. We open source train and test datasets for five language pairs and establish benchmarks for this task. Our key technical contribution is a novel semi-supervised solution for generating alternatives that integrates seamlessly with standard MT models and maintains high performance without requiring additional components or increasing inference overhead.

Related papers

Gender Bias in English-to-Greek Machine Translation [0.0]
We find persistent gender bias in translations by both Google Translate and DeepL.<n>GPT-4o shows promise, generating appropriate gendered and neutral alternatives for most ambiguous cases.
arXiv Detail & Related papers (2025-06-11T09:44:12Z)
EuroGEST: Investigating gender stereotypes in multilingual language models [53.88459905621724]
Large language models increasingly support multiple languages, yet most benchmarks for gender bias remain English-centric.<n>We introduce EuroGEST, a dataset designed to measure gender-stereotypical reasoning in LLMs across English and 29 European languages.
arXiv Detail & Related papers (2025-06-04T11:58:18Z)
Assumed Identities: Quantifying Gender Bias in Machine Translation of Gender-Ambiguous Occupational Terms [2.5764960393034615]
We introduce GRAPE, a probability-based metric designed to evaluate gender bias.<n>We present GAMBIT-MT, a benchmarking dataset in English with gender-ambiguous occupational terms.
arXiv Detail & Related papers (2025-03-06T12:16:14Z)
Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders. This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words) We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z)
Gender Bias in Machine Translation and The Era of Large Language Models [0.8702432681310399]
This chapter examines the role of Machine Translation in perpetuating gender bias, highlighting the challenges posed by cross-linguistic settings and statistical dependencies. A comprehensive overview of relevant existing work related to gender bias in both conventional Neural Machine Translation approaches and Generative Pretrained Transformer models employed as Machine Translation systems is provided.
arXiv Detail & Related papers (2024-01-18T14:34:49Z)
A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for Fairer Instruction-Tuned Machine Translation [35.44115368160656]
We investigate whether and to what extent machine translation models exhibit gender bias. We find that IFT models default to male-inflected translations, even disregarding female occupational stereotypes. We propose an easy-to-implement and effective bias mitigation solution.
arXiv Detail & Related papers (2023-10-18T17:36:55Z)
The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages [51.2321117760104]
This paper describes the Gender-GAP Pipeline, an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages. The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text. We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.
arXiv Detail & Related papers (2023-08-31T17:20:50Z)
"I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation [69.25368160338043]
Transgender and non-binary (TGNB) individuals disproportionately experience discrimination and exclusion from daily life. We assess how the social reality surrounding experienced marginalization of TGNB persons contributes to and persists within Open Language Generation. We introduce TANGO, a dataset of template-based real-world text curated from a TGNB-oriented community.
arXiv Detail & Related papers (2023-05-17T04:21:45Z)
GATE: A Challenge Set for Gender-Ambiguous Translation Examples [0.31498833540989407]
When source gender is ambiguous, machine translation models typically default to stereotypical gender roles, perpetuating harmful bias. Recent work has led to the development of "gender rewriters" that generate alternative gender translations on such ambiguous inputs, but such systems are plagued by poor linguistic coverage. We present and release GATE, a linguistically diverse corpus of gender-ambiguous source sentences along with multiple alternative target language translations.
arXiv Detail & Related papers (2023-03-07T15:23:38Z)
Investigating Failures of Automatic Translation in the Case of Unambiguous Gender [13.58884863186619]
Transformer based models are the modern work horses for neural machine translation (NMT) We observe a systemic and rudimentary class of errors made by transformer based models with regards to translating from a language that doesn't mark gender on nouns into others that do. We release an evaluation scheme and dataset for measuring the ability of transformer based NMT models to translate gender correctly.
arXiv Detail & Related papers (2021-04-16T00:57:36Z)
They, Them, Theirs: Rewriting with Gender-Neutral English [56.14842450974887]
We perform a case study on the singular they, a common way to promote gender inclusion in English. We show how a model can be trained to produce gender-neutral English with 1% word error rate with no human-labeled data.
arXiv Detail & Related papers (2021-02-12T21:47:48Z)
Decoding and Diversity in Machine Translation [90.33636694717954]
We characterize differences between cost diversity paid for the BLEU scores enjoyed by NMT. Our study implicates search as a salient source of known bias when translating gender pronouns.
arXiv Detail & Related papers (2020-11-26T21:09:38Z)
Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text. We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
Reducing Gender Bias in Neural Machine Translation as a Domain Adaptation Problem [21.44025591721678]
Training data for NLP tasks often exhibits gender bias in that fewer sentences refer to women than to men. Recent WinoMT challenge set allows us to measure this effect directly. We use transfer learning on a small set of trusted, gender-balanced examples.
arXiv Detail & Related papers (2020-04-09T11:55:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.