Improving Gender Translation Accuracy with Filtered Self-Training
- URL: http://arxiv.org/abs/2104.07695v1
- Date: Thu, 15 Apr 2021 18:05:29 GMT
- Title: Improving Gender Translation Accuracy with Filtered Self-Training
- Authors: Prafulla Kumar Choubey, Anna Currey, Prashant Mathur, Georgiana Dinu
- Abstract summary: Machine translation systems often output incorrect gender, even when the gender is clear from context.
We propose a gender-filtered self-training technique to improve gender translation accuracy on unambiguously gendered inputs.
- Score: 14.938401898546548
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Targeted evaluations have found that machine translation systems often output
incorrect gender, even when the gender is clear from context. Furthermore,
these incorrectly gendered translations have the potential to reflect or
amplify social biases. We propose a gender-filtered self-training technique to
improve gender translation accuracy on unambiguously gendered inputs. This
approach uses a source monolingual corpus and an initial model to generate
gender-specific pseudo-parallel corpora which are then added to the training
data. We filter the gender-specific corpora on the source and target sides to
ensure that sentence pairs contain and correctly translate the specified
gender. We evaluate our approach on translation from English into five
languages, finding that our models improve gender translation accuracy without
any cost to generic translation quality. In addition, we show the viability of
our approach on several settings, including re-training from scratch,
fine-tuning, controlling the balance of the training data, forward translation,
and back-translation.
Related papers
- Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender
Characterisation in 55 Languages [51.2321117760104]
This paper describes the Gender-GAP Pipeline, an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages.
The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text.
We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.
arXiv Detail & Related papers (2023-08-31T17:20:50Z) - Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in
Multilingual Machine Translation [28.471506840241602]
Gender bias is a significant issue in machine translation, leading to ongoing research efforts in developing bias mitigation techniques.
We propose a bias mitigation method based on a novel approach.
Gender-Aware Contrastive Learning, GACL, encodes contextual gender information into the representations of non-explicit gender words.
arXiv Detail & Related papers (2023-05-23T12:53:39Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - GATE: A Challenge Set for Gender-Ambiguous Translation Examples [0.31498833540989407]
When source gender is ambiguous, machine translation models typically default to stereotypical gender roles, perpetuating harmful bias.
Recent work has led to the development of "gender rewriters" that generate alternative gender translations on such ambiguous inputs, but such systems are plagued by poor linguistic coverage.
We present and release GATE, a linguistically diverse corpus of gender-ambiguous source sentences along with multiple alternative target language translations.
arXiv Detail & Related papers (2023-03-07T15:23:38Z) - Mitigating Gender Bias in Machine Translation through Adversarial
Learning [0.8883733362171032]
We present an adversarial learning framework that addresses challenges to mitigate gender bias in seq2seq machine translation.
Our framework improves the disparity in translation quality for sentences with male vs. female entities by 86% for English-German translation and 91% for English-French translation.
arXiv Detail & Related papers (2022-03-20T23:35:09Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - They, Them, Theirs: Rewriting with Gender-Neutral English [56.14842450974887]
We perform a case study on the singular they, a common way to promote gender inclusion in English.
We show how a model can be trained to produce gender-neutral English with 1% word error rate with no human-labeled data.
arXiv Detail & Related papers (2021-02-12T21:47:48Z) - Mitigating Gender Bias in Machine Translation with Target Gender
Annotations [3.3194866396158]
When translating "The secretary asked for details" to a language with grammatical gender, it might be necessary to determine the gender of the subject "secretary"
In such cases, machine translation systems select the most common translation option, which often corresponds to the stereotypical translations.
We argue that the information necessary for an adequate translation can not always be deduced from the sentence being translated.
We present a method for training machine translation systems to use word-level annotations containing information about subject's gender.
arXiv Detail & Related papers (2020-10-13T07:07:59Z) - Neural Machine Translation Doesn't Translate Gender Coreference Right
Unless You Make It [18.148675498274866]
We propose schemes for incorporating explicit word-level gender inflection tags into Neural Machine Translation.
We find that simple existing approaches can over-generalize a gender-feature to multiple entities in a sentence.
We also propose an extension to assess translations of gender-neutral entities from English given a corresponding linguistic convention.
arXiv Detail & Related papers (2020-10-11T20:05:42Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.