Related papers: Improving Gender Translation Accuracy with Filtered Self-Training

Improving Gender Translation Accuracy with Filtered Self-Training

URL: http://arxiv.org/abs/2104.07695v1
Date: Thu, 15 Apr 2021 18:05:29 GMT
Title: Improving Gender Translation Accuracy with Filtered Self-Training
Authors: Prafulla Kumar Choubey, Anna Currey, Prashant Mathur, Georgiana Dinu
Abstract summary: Machine translation systems often output incorrect gender, even when the gender is clear from context. We propose a gender-filtered self-training technique to improve gender translation accuracy on unambiguously gendered inputs.
Score: 14.938401898546548
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Targeted evaluations have found that machine translation systems often output incorrect gender, even when the gender is clear from context. Furthermore, these incorrectly gendered translations have the potential to reflect or amplify social biases. We propose a gender-filtered self-training technique to improve gender translation accuracy on unambiguously gendered inputs. This approach uses a source monolingual corpus and an initial model to generate gender-specific pseudo-parallel corpora which are then added to the training data. We filter the gender-specific corpora on the source and target sides to ensure that sentence pairs contain and correctly translate the specified gender. We evaluate our approach on translation from English into five languages, finding that our models improve gender translation accuracy without any cost to generic translation quality. In addition, we show the viability of our approach on several settings, including re-training from scratch, fine-tuning, controlling the balance of the training data, forward translation, and back-translation.

Related papers

EuroGEST: Investigating gender stereotypes in multilingual language models [53.88459905621724]
Large language models increasingly support multiple languages, yet most benchmarks for gender bias remain English-centric.<n>We introduce EuroGEST, a dataset designed to measure gender-stereotypical reasoning in LLMs across English and 29 European languages.
arXiv Detail & Related papers (2025-06-04T11:58:18Z)
Identifying Gender Stereotypes and Biases in Automated Translation from English to Italian using Similarity Networks [0.25049267048783647]
This paper is a collaborative effort between Linguistics, Law, and Computer Science to evaluate stereotypes and biases in automated translation systems. We advocate gender-neutral translation as a means to promote gender inclusion and improve the objectivity of machine translation.
arXiv Detail & Related papers (2025-02-17T09:55:32Z)
Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders. This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words) We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z)
The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages [51.2321117760104]
This paper describes the Gender-GAP Pipeline, an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages. The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text. We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.
arXiv Detail & Related papers (2023-08-31T17:20:50Z)
Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation [28.471506840241602]
Gender bias is a significant issue in machine translation, leading to ongoing research efforts in developing bias mitigation techniques. We propose a bias mitigation method based on a novel approach. Gender-Aware Contrastive Learning, GACL, encodes contextual gender information into the representations of non-explicit gender words.
arXiv Detail & Related papers (2023-05-23T12:53:39Z)
The Best of Both Worlds: Combining Human and Machine Translations for Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations. An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z)
GATE: A Challenge Set for Gender-Ambiguous Translation Examples [0.31498833540989407]
When source gender is ambiguous, machine translation models typically default to stereotypical gender roles, perpetuating harmful bias. Recent work has led to the development of "gender rewriters" that generate alternative gender translations on such ambiguous inputs, but such systems are plagued by poor linguistic coverage. We present and release GATE, a linguistically diverse corpus of gender-ambiguous source sentences along with multiple alternative target language translations.
arXiv Detail & Related papers (2023-03-07T15:23:38Z)
Mitigating Gender Bias in Machine Translation through Adversarial Learning [0.8883733362171032]
We present an adversarial learning framework that addresses challenges to mitigate gender bias in seq2seq machine translation. Our framework improves the disparity in translation quality for sentences with male vs. female entities by 86% for English-German translation and 91% for English-French translation.
arXiv Detail & Related papers (2022-03-20T23:35:09Z)
DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus. We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z)
They, Them, Theirs: Rewriting with Gender-Neutral English [56.14842450974887]
We perform a case study on the singular they, a common way to promote gender inclusion in English. We show how a model can be trained to produce gender-neutral English with 1% word error rate with no human-labeled data.
arXiv Detail & Related papers (2021-02-12T21:47:48Z)
Mitigating Gender Bias in Machine Translation with Target Gender Annotations [3.3194866396158]
When translating "The secretary asked for details" to a language with grammatical gender, it might be necessary to determine the gender of the subject "secretary" In such cases, machine translation systems select the most common translation option, which often corresponds to the stereotypical translations. We argue that the information necessary for an adequate translation can not always be deduced from the sentence being translated. We present a method for training machine translation systems to use word-level annotations containing information about subject's gender.
arXiv Detail & Related papers (2020-10-13T07:07:59Z)
Neural Machine Translation Doesn't Translate Gender Coreference Right Unless You Make It [18.148675498274866]
We propose schemes for incorporating explicit word-level gender inflection tags into Neural Machine Translation. We find that simple existing approaches can over-generalize a gender-feature to multiple entities in a sentence. We also propose an extension to assess translations of gender-neutral entities from English given a corresponding linguistic convention.
arXiv Detail & Related papers (2020-10-11T20:05:42Z)
Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text. We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.