As Easy as 1, 2, 3: Behavioural Testing of NMT Systems for Numerical
Translation
- URL: http://arxiv.org/abs/2107.08357v1
- Date: Sun, 18 Jul 2021 04:09:47 GMT
- Title: As Easy as 1, 2, 3: Behavioural Testing of NMT Systems for Numerical
Translation
- Authors: Jun Wang, Chang Xu, Francisco Guzman, Ahmed El-Kishky, Benjamin I. P.
Rubinstein, Trevor Cohn
- Abstract summary: Mistranslated numbers have the potential to cause serious effects, such as financial loss or medical misinformation.
We develop comprehensive assessments of the robustness of neural machine translation systems to numerical text via behavioural testing.
- Score: 51.20569527047729
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mistranslated numbers have the potential to cause serious effects, such as
financial loss or medical misinformation. In this work we develop comprehensive
assessments of the robustness of neural machine translation systems to
numerical text via behavioural testing. We explore a variety of numerical
translation capabilities a system is expected to exhibit and design effective
test examples to expose system underperformance. We find that numerical
mistranslation is a general issue: major commercial systems and
state-of-the-art research models fail on many of our test examples, for high-
and low-resource languages. Our tests reveal novel errors that have not
previously been reported in NMT systems, to the best of our knowledge. Lastly,
we discuss strategies to mitigate numerical mistranslation.
Related papers
- Understanding and Addressing the Under-Translation Problem from the Perspective of Decoding Objective [72.83966378613238]
Under-translation and over-translation remain two challenging problems in state-of-the-art Neural Machine Translation (NMT) systems.
We conduct an in-depth analysis on the underlying cause of under-translation in NMT, providing an explanation from the perspective of decoding objective.
We propose employing the confidence of predicting End Of Sentence (EOS) as a detector for under-translation, and strengthening the confidence-based penalty to penalize candidates with a high risk of under-translation.
arXiv Detail & Related papers (2024-05-29T09:25:49Z) - Towards General Error Diagnosis via Behavioral Testing in Machine
Translation [48.108393938462974]
This paper proposes a new framework for conducting behavioral testing of machine translation (MT) systems.
The core idea of BTPGBT is to employ a novel bilingual translation pair generation approach.
Experimental results on various MT systems demonstrate that BTPGBT could provide comprehensive and accurate behavioral testing results.
arXiv Detail & Related papers (2023-10-20T09:06:41Z) - Automating Behavioral Testing in Machine Translation [9.151054827967933]
We propose to use Large Language Models to generate source sentences tailored to test the behavior of Machine Translation models.
We can then verify whether the MT model exhibits the expected behavior through matching candidate sets.
Our approach aims to make behavioral testing of MT systems practical while requiring only minimal human effort.
arXiv Detail & Related papers (2023-09-05T19:40:45Z) - SALTED: A Framework for SAlient Long-Tail Translation Error Detection [17.914521288548844]
We introduce SALTED, a specifications-based framework for behavioral testing of machine translation models.
At the core of our approach is the development of high-precision detectors that flag errors between a source sentence and a system output.
We demonstrate that such detectors could be used not just to identify salient long-tail errors in MT systems, but also for higher-recall filtering of the training data.
arXiv Detail & Related papers (2022-05-20T06:45:07Z) - Curious Case of Language Generation Evaluation Metrics: A Cautionary
Tale [52.663117551150954]
A few popular metrics remain as the de facto metrics to evaluate tasks such as image captioning and machine translation.
This is partly due to ease of use, and partly because researchers expect to see them and know how to interpret them.
In this paper, we urge the community for more careful consideration of how they automatically evaluate their models.
arXiv Detail & Related papers (2020-10-26T13:57:20Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z) - Robust Unsupervised Neural Machine Translation with Adversarial
Denoising Training [66.39561682517741]
Unsupervised neural machine translation (UNMT) has attracted great interest in the machine translation community.
The main advantage of the UNMT lies in its easy collection of required large training text sentences.
In this paper, we first time explicitly take the noisy data into consideration to improve the robustness of the UNMT based systems.
arXiv Detail & Related papers (2020-02-28T05:17:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.