MT-GenEval: A Counterfactual and Contextual Dataset for Evaluating
Gender Accuracy in Machine Translation
- URL: http://arxiv.org/abs/2211.01355v1
- Date: Wed, 2 Nov 2022 17:55:43 GMT
- Title: MT-GenEval: A Counterfactual and Contextual Dataset for Evaluating
Gender Accuracy in Machine Translation
- Authors: Anna Currey, Maria N\u{a}dejde, Raghavendra Pappagari, Mia Mayer,
Stanislas Lauly, Xing Niu, Benjamin Hsu, Georgiana Dinu
- Abstract summary: We introduce MT-GenEval, a benchmark for evaluating gender accuracy in translation from English into eight languages.
Our data and code are publicly available under a CC BY SA 3.0 license.
- Score: 18.074541317458817
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As generic machine translation (MT) quality has improved, the need for
targeted benchmarks that explore fine-grained aspects of quality has increased.
In particular, gender accuracy in translation can have implications in terms of
output fluency, translation accuracy, and ethics. In this paper, we introduce
MT-GenEval, a benchmark for evaluating gender accuracy in translation from
English into eight widely-spoken languages. MT-GenEval complements existing
benchmarks by providing realistic, gender-balanced, counterfactual data in
eight language pairs where the gender of individuals is unambiguous in the
input segment, including multi-sentence segments requiring inter-sentential
gender agreement. Our data and code is publicly available under a CC BY SA 3.0
license.
Related papers
- FairTranslate: An English-French Dataset for Gender Bias Evaluation in Machine Translation by Overcoming Gender Binarity [0.6827423171182154]
Large Language Models (LLMs) are increasingly leveraged for translation tasks but often fall short when translating inclusive language.
This paper presents a novel, fully human-annotated dataset designed to evaluate non-binary gender biases in machine translation systems from English to French.
arXiv Detail & Related papers (2025-04-22T14:35:16Z) - GFG -- Gender-Fair Generation: A CALAMITA Challenge [15.399739689743935]
Gender-fair language aims at promoting gender equality by using terms and expressions that include all identities.
Gender-Fair Generation challenge intends to help shift toward gender-fair language in written communication.
arXiv Detail & Related papers (2024-12-26T10:58:40Z) - Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation [28.01631390361754]
Masculine-inflected translations score higher than feminine-inflected ones, and gender-neutral translations are penalized.
context-aware QE metrics reduce errors for masculine-inflected references but fail to address feminine referents.
Our findings underscore the need to address gender bias in QE metrics to ensure equitable and unbiased machine translation systems.
arXiv Detail & Related papers (2024-10-14T18:24:52Z) - GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.
GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z) - Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - Fine-grained Gender Control in Machine Translation with Large Language Models [15.63784352130237]
We tackle controlled translation in a more realistic setting of inputs with multiple entities.
Our proposed method instructs the model with fine-grained entity-level gender information to translate with correct gender inflections.
We discover an emergence of gender interference phenomenon when controlling the gender of multiple entities.
arXiv Detail & Related papers (2024-07-21T13:15:00Z) - M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection [69.41274756177336]
Large Language Models (LLMs) have brought an unprecedented surge in machine-generated text (MGT) across diverse channels.
This raises legitimate concerns about its potential misuse and societal implications.
We introduce a new benchmark based on a multilingual, multi-domain, and multi-generator corpus of MGTs -- M4GT-Bench.
arXiv Detail & Related papers (2024-02-17T02:50:33Z) - The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender
Characterisation in 55 Languages [51.2321117760104]
This paper describes the Gender-GAP Pipeline, an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages.
The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text.
We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.
arXiv Detail & Related papers (2023-08-31T17:20:50Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - Mitigating Gender Bias in Machine Translation through Adversarial
Learning [0.8883733362171032]
We present an adversarial learning framework that addresses challenges to mitigate gender bias in seq2seq machine translation.
Our framework improves the disparity in translation quality for sentences with male vs. female entities by 86% for English-German translation and 91% for English-French translation.
arXiv Detail & Related papers (2022-03-20T23:35:09Z) - Investigating Failures of Automatic Translation in the Case of
Unambiguous Gender [13.58884863186619]
Transformer based models are the modern work horses for neural machine translation (NMT)
We observe a systemic and rudimentary class of errors made by transformer based models with regards to translating from a language that doesn't mark gender on nouns into others that do.
We release an evaluation scheme and dataset for measuring the ability of transformer based NMT models to translate gender correctly.
arXiv Detail & Related papers (2021-04-16T00:57:36Z) - Improving Gender Translation Accuracy with Filtered Self-Training [14.938401898546548]
Machine translation systems often output incorrect gender, even when the gender is clear from context.
We propose a gender-filtered self-training technique to improve gender translation accuracy on unambiguously gendered inputs.
arXiv Detail & Related papers (2021-04-15T18:05:29Z) - Decoding and Diversity in Machine Translation [90.33636694717954]
We characterize differences between cost diversity paid for the BLEU scores enjoyed by NMT.
Our study implicates search as a salient source of known bias when translating gender pronouns.
arXiv Detail & Related papers (2020-11-26T21:09:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.