Good, but not always Fair: An Evaluation of Gender Bias for three commercial Machine Translation Systems
- URL: http://arxiv.org/abs/2306.05882v2
- Date: Tue, 26 Mar 2024 22:54:48 GMT
- Title: Good, but not always Fair: An Evaluation of Gender Bias for three commercial Machine Translation Systems
- Authors: Silvia Alma Piazzolla, Beatrice Savoldi, Luisa Bentivogli,
- Abstract summary: This paper offers a meticulous assessment of three commercial Machine Translation systems - Google Translate, DeepL, and Modern MT.
For three language pairs (English/Spanish, English/Italian, and English/French), we scrutinize the behavior of such systems at several levels of granularity and on a variety of naturally occurring gender phenomena in translation.
Our study takes stock of the current state of online MT tools, by revealing significant discrepancies in the gender translation of the three systems, with each system displaying varying degrees of bias despite their overall translation quality.
- Score: 4.802214389376064
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Machine Translation (MT) continues to make significant strides in quality and is increasingly adopted on a larger scale. Consequently, analyses have been redirected to more nuanced aspects, intricate phenomena, as well as potential risks that may arise from the widespread use of MT tools. Along this line, this paper offers a meticulous assessment of three commercial MT systems - Google Translate, DeepL, and Modern MT - with a specific focus on gender translation and bias. For three language pairs (English/Spanish, English/Italian, and English/French), we scrutinize the behavior of such systems at several levels of granularity and on a variety of naturally occurring gender phenomena in translation. Our study takes stock of the current state of online MT tools, by revealing significant discrepancies in the gender translation of the three systems, with each system displaying varying degrees of bias despite their overall translation quality.
Related papers
- Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation [28.01631390361754]
This paper is the first to investigate gender bias in quality estimation (QE) metrics and its downstream impact on machine translation (MT)
Masculine-inflected translations score higher than feminine-inflected ones, and gender-neutral translations are penalized.
We show that QE metrics can perpetuate gender bias in MT systems when used in quality-aware decoding.
arXiv Detail & Related papers (2024-10-14T18:24:52Z) - Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - Gender Bias in Machine Translation and The Era of Large Language Models [0.8702432681310399]
This chapter examines the role of Machine Translation in perpetuating gender bias, highlighting the challenges posed by cross-linguistic settings and statistical dependencies.
A comprehensive overview of relevant existing work related to gender bias in both conventional Neural Machine Translation approaches and Generative Pretrained Transformer models employed as Machine Translation systems is provided.
arXiv Detail & Related papers (2024-01-18T14:34:49Z) - BLEURT Has Universal Translations: An Analysis of Automatic Metrics by
Minimum Risk Training [64.37683359609308]
In this study, we analyze various mainstream and cutting-edge automatic metrics from the perspective of their guidance for training machine translation systems.
We find that certain metrics exhibit robustness defects, such as the presence of universal adversarial translations in BLEURT and BARTScore.
In-depth analysis suggests two main causes of these robustness deficits: distribution biases in the training datasets, and the tendency of the metric paradigm.
arXiv Detail & Related papers (2023-07-06T16:59:30Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - Supervised Visual Attention for Simultaneous Multimodal Machine
Translation [47.18251159303909]
We propose the first Transformer-based simultaneous machine translation (MMT) architecture.
We extend this model with an auxiliary supervision signal that guides its visual attention mechanism using labelled phrase-region alignments.
Our results show that supervised visual attention consistently improves the translation quality of the MMT models.
arXiv Detail & Related papers (2022-01-23T17:25:57Z) - It is Not as Good as You Think! Evaluating Simultaneous Machine
Translation on Interpretation Data [58.105938143865906]
We argue that SiMT systems should be trained and tested on real interpretation data.
Our results highlight the difference of up-to 13.83 BLEU score when SiMT models are evaluated on translation vs interpretation data.
arXiv Detail & Related papers (2021-10-11T12:27:07Z) - Machine Translationese: Effects of Algorithmic Bias on Linguistic
Complexity in Machine Translation [2.0625936401496237]
We go beyond the study of gender in Machine Translation and investigate how bias amplification might affect language in a broader sense.
We assess the linguistic richness (on a lexical and morphological level) of translations created by different data-driven MT paradigms.
arXiv Detail & Related papers (2021-01-30T18:49:11Z) - Decoding and Diversity in Machine Translation [90.33636694717954]
We characterize differences between cost diversity paid for the BLEU scores enjoyed by NMT.
Our study implicates search as a salient source of known bias when translating gender pronouns.
arXiv Detail & Related papers (2020-11-26T21:09:38Z) - On the Integration of LinguisticFeatures into Statistical and Neural
Machine Translation [2.132096006921048]
We investigate the discrepancies between the strengths of statistical approaches to machine translation and the way humans translate.
We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations.
We identify overgeneralization or 'algomic bias' as a potential drawback of neural MT and link it to many of the remaining linguistic issues.
arXiv Detail & Related papers (2020-03-31T16:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.