Machine Translationese: Effects of Algorithmic Bias on Linguistic
Complexity in Machine Translation
- URL: http://arxiv.org/abs/2102.00287v1
- Date: Sat, 30 Jan 2021 18:49:11 GMT
- Title: Machine Translationese: Effects of Algorithmic Bias on Linguistic
Complexity in Machine Translation
- Authors: Eva Vanmassenhove, Dimitar Shterionov, Matthew Gwilliam
- Abstract summary: We go beyond the study of gender in Machine Translation and investigate how bias amplification might affect language in a broader sense.
We assess the linguistic richness (on a lexical and morphological level) of translations created by different data-driven MT paradigms.
- Score: 2.0625936401496237
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies in the field of Machine Translation (MT) and Natural Language
Processing (NLP) have shown that existing models amplify biases observed in the
training data. The amplification of biases in language technology has mainly
been examined with respect to specific phenomena, such as gender bias. In this
work, we go beyond the study of gender in MT and investigate how bias
amplification might affect language in a broader sense. We hypothesize that the
'algorithmic bias', i.e. an exacerbation of frequently observed patterns in
combination with a loss of less frequent ones, not only exacerbates societal
biases present in current datasets but could also lead to an artificially
impoverished language: 'machine translationese'. We assess the linguistic
richness (on a lexical and morphological level) of translations created by
different data-driven MT paradigms - phrase-based statistical (PB-SMT) and
neural MT (NMT). Our experiments show that there is a loss of lexical and
morphological richness in the translations produced by all investigated MT
paradigms for two language pairs (EN<=>FR and EN<=>ES).
Related papers
- Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach [1.6982207802596105]
This study investigates three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT.
arXiv Detail & Related papers (2023-12-17T15:56:05Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - When Does Translation Require Context? A Data-driven, Multilingual
Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT)
Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation.
We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z) - Examining Covert Gender Bias: A Case Study in Turkish and English
Machine Translation Models [7.648784748888186]
We examine cases of both overt and covert gender bias in Machine Translation models.
Specifically, we introduce a method to investigate asymmetrical gender markings.
We also assess bias in the attribution of personhood and examine occupational and personality stereotypes.
arXiv Detail & Related papers (2021-08-23T19:25:56Z) - Beyond Noise: Mitigating the Impact of Fine-grained Semantic Divergences
on Neural Machine Translation [14.645468999921961]
We analyze the impact of different types of fine-grained semantic divergences on Transformer models.
We introduce a divergent-aware NMT framework that uses factors to help NMT recover from the degradation caused by naturally occurring divergences.
arXiv Detail & Related papers (2021-05-31T16:15:35Z) - Decoding and Diversity in Machine Translation [90.33636694717954]
We characterize differences between cost diversity paid for the BLEU scores enjoyed by NMT.
Our study implicates search as a salient source of known bias when translating gender pronouns.
arXiv Detail & Related papers (2020-11-26T21:09:38Z) - On Long-Tailed Phenomena in Neural Machine Translation [50.65273145888896]
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens.
We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation.
We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy.
arXiv Detail & Related papers (2020-10-10T07:00:57Z) - Assessing the Bilingual Knowledge Learned by Neural Machine Translation
Models [72.56058378313963]
We bridge the gap by assessing the bilingual knowledge learned by NMT models with phrase table.
We find that NMT models learn patterns from simple to complex and distill essential bilingual knowledge from the training examples.
arXiv Detail & Related papers (2020-04-28T03:44:34Z) - On the Integration of LinguisticFeatures into Statistical and Neural
Machine Translation [2.132096006921048]
We investigate the discrepancies between the strengths of statistical approaches to machine translation and the way humans translate.
We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations.
We identify overgeneralization or 'algomic bias' as a potential drawback of neural MT and link it to many of the remaining linguistic issues.
arXiv Detail & Related papers (2020-03-31T16:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.