It's not a Non-Issue: Negation as a Source of Error in Machine
Translation
- URL: http://arxiv.org/abs/2010.05432v1
- Date: Mon, 12 Oct 2020 03:34:44 GMT
- Title: It's not a Non-Issue: Negation as a Source of Error in Machine
Translation
- Authors: Md Mosharaf Hossain, Antonios Anastasopoulos, Eduardo Blanco, and
Alexis Palmer
- Abstract summary: We investigate whether translating negation is an issue for modern machine translation systems using 17 translation directions as test bed.
We find that indeed the presence of negation can significantly impact downstream quality, in some cases resulting in quality reductions of more than 60%.
- Score: 33.991817055535854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As machine translation (MT) systems progress at a rapid pace, questions of
their adequacy linger. In this study we focus on negation, a universal, core
property of human language that significantly affects the semantics of an
utterance. We investigate whether translating negation is an issue for modern
MT systems using 17 translation directions as test bed. Through thorough
analysis, we find that indeed the presence of negation can significantly impact
downstream quality, in some cases resulting in quality reductions of more than
60%. We also provide a linguistically motivated analysis that directly explains
the majority of our findings. We release our annotations and code to replicate
our analysis here: https://github.com/mosharafhossain/negation-mt.
Related papers
- Evaluating Automatic Metrics with Incremental Machine Translation Systems [55.78547133890403]
We introduce a dataset comprising commercial machine translations, gathered weekly over six years across 12 translation directions.
We assume commercial systems improve over time, which enables us to evaluate machine translation (MT) metrics based on their preference for more recent translations.
arXiv Detail & Related papers (2024-07-03T17:04:17Z) - Understanding and Addressing the Under-Translation Problem from the Perspective of Decoding Objective [72.83966378613238]
Under-translation and over-translation remain two challenging problems in state-of-the-art Neural Machine Translation (NMT) systems.
We conduct an in-depth analysis on the underlying cause of under-translation in NMT, providing an explanation from the perspective of decoding objective.
We propose employing the confidence of predicting End Of Sentence (EOS) as a detector for under-translation, and strengthening the confidence-based penalty to penalize candidates with a high risk of under-translation.
arXiv Detail & Related papers (2024-05-29T09:25:49Z) - Mitigating Hallucinations and Off-target Machine Translation with
Source-Contrastive and Language-Contrastive Decoding [53.84948040596055]
We introduce two related methods to mitigate failure cases with a modified decoding objective.
Experiments on the massively multilingual models M2M-100 (418M) and SMaLL-100 show that these methods suppress hallucinations and off-target translations.
arXiv Detail & Related papers (2023-09-13T17:15:27Z) - Evaluation of Chinese-English Machine Translation of Emotion-Loaded
Microblog Texts: A Human Annotated Dataset for the Quality Assessment of
Emotion Translation [7.858458986992082]
In this paper, we focus on how current Machine Translation (MT) tools perform on the translation of emotion-loaded texts.
We propose this evaluation framework based on the Multidimensional Quality Metrics (MQM) and perform a detailed error analysis of the MT outputs.
arXiv Detail & Related papers (2023-06-20T21:22:45Z) - Do GPTs Produce Less Literal Translations? [20.095646048167612]
Large Language Models (LLMs) have emerged as general-purpose language models capable of addressing many natural language generation or understanding tasks.
We find that translations out of English (E-X) from GPTs tend to be less literal, while exhibiting similar or better scores on Machine Translation quality metrics.
arXiv Detail & Related papers (2023-05-26T10:38:31Z) - Competency-Aware Neural Machine Translation: Can Machine Translation
Know its Own Translation Quality? [61.866103154161884]
Neural machine translation (NMT) is often criticized for failures that happen without awareness.
We propose a novel competency-aware NMT by extending conventional NMT with a self-estimator.
We show that the proposed method delivers outstanding performance on quality estimation.
arXiv Detail & Related papers (2022-11-25T02:39:41Z) - Automatic Evaluation and Analysis of Idioms in Neural Machine
Translation [12.227312923011986]
We present a novel metric for measuring the frequency of literal translation errors without human involvement.
We explore the role of monolingual pretraining and find that it yields substantial targeted improvements.
We find that the randomly idiom models are more local or "myopic" as they are relatively unaffected by variations of the context.
arXiv Detail & Related papers (2022-10-10T10:30:09Z) - Revisiting Negation in Neural Machine Translation [26.694559863395877]
We show that the ability of neural machine translation (NMT) models to translate negation has improved with deeper and more advanced networks.
The accuracy of manual evaluation in English--German (EN--DE) and English--Chinese (EN--ZH) is 95.7%, 94.8%, 93.4%, and 91.7%, respectively.
arXiv Detail & Related papers (2021-07-26T13:19:57Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z) - On the Integration of LinguisticFeatures into Statistical and Neural
Machine Translation [2.132096006921048]
We investigate the discrepancies between the strengths of statistical approaches to machine translation and the way humans translate.
We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations.
We identify overgeneralization or 'algomic bias' as a potential drawback of neural MT and link it to many of the remaining linguistic issues.
arXiv Detail & Related papers (2020-03-31T16:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.