SALTED: A Framework for SAlient Long-Tail Translation Error Detection
- URL: http://arxiv.org/abs/2205.09988v1
- Date: Fri, 20 May 2022 06:45:07 GMT
- Title: SALTED: A Framework for SAlient Long-Tail Translation Error Detection
- Authors: Vikas Raunak, Matt Post, Arul Menezes
- Abstract summary: We introduce SALTED, a specifications-based framework for behavioral testing of machine translation models.
At the core of our approach is the development of high-precision detectors that flag errors between a source sentence and a system output.
We demonstrate that such detectors could be used not just to identify salient long-tail errors in MT systems, but also for higher-recall filtering of the training data.
- Score: 17.914521288548844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional machine translation (MT) metrics provide an average measure of
translation quality that is insensitive to the long tail of behavioral problems
in MT. Examples include translation of numbers, physical units, dropped content
and hallucinations. These errors, which occur rarely and unpredictably in
Neural Machine Translation (NMT), greatly undermine the reliability of
state-of-the-art MT systems. Consequently, it is important to have visibility
into these problems during model development. Towards this direction, we
introduce SALTED, a specifications-based framework for behavioral testing of MT
models that provides fine-grained views of salient long-tail errors, permitting
trustworthy visibility into previously invisible problems. At the core of our
approach is the development of high-precision detectors that flag errors (or
alternatively, verify output correctness) between a source sentence and a
system output. We demonstrate that such detectors could be used not just to
identify salient long-tail errors in MT systems, but also for higher-recall
filtering of the training data, fixing targeted errors with model fine-tuning
in NMT and generating novel data for metamorphic testing to elicit further bugs
in models.
Related papers
- Cyber Risks of Machine Translation Critical Errors : Arabic Mental Health Tweets as a Case Study [3.8779763612314637]
We introduce an authentic dataset of machine translation critical errors to point to the ethical and safety issues involved in the common use of MT.
The dataset comprises mistranslations of Arabic mental health postings manually annotated with critical error types.
We also show how the commonly used quality metrics do not penalise critical errors and highlight this as a critical issue that merits further attention from researchers.
arXiv Detail & Related papers (2024-05-19T20:24:51Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Perturbation-based QE: An Explainable, Unsupervised Word-level Quality
Estimation Method for Blackbox Machine Translation [12.376309678270275]
Perturbation-based QE works simply by analyzing MT system output on perturbed input source sentences.
Our approach is better at detecting gender bias and word-sense-disambiguation errors in translation than supervised QE.
arXiv Detail & Related papers (2023-05-12T13:10:57Z) - Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT
Models for Code Generation [1.7616042687330642]
A key step to validate the robustness of the NMT models is to evaluate their performance on adversarial inputs.
In this work, we identify a set of perturbations and metrics tailored for the robustness assessment of such models.
We present a preliminary experimental evaluation, showing what type of perturbations affect the model the most.
arXiv Detail & Related papers (2022-03-29T08:01:39Z) - When Does Translation Require Context? A Data-driven, Multilingual
Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT)
Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation.
We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Detecting Hallucinated Content in Conditional Neural Sequence Generation [165.68948078624499]
We propose a task to predict whether each token in the output sequence is hallucinated (not contained in the input)
We also introduce a method for learning to detect hallucinations using pretrained language models fine tuned on synthetic data.
arXiv Detail & Related papers (2020-11-05T00:18:53Z) - Sentence Boundary Augmentation For Neural Machine Translation Robustness [11.290581889247983]
We show that sentence boundary segmentation has the largest impact on quality, and we develop a simple data augmentation strategy to improve segmentation robustness.
We show that sentence boundary segmentation has the largest impact on quality, and we develop a simple data augmentation strategy to improve segmentation robustness.
arXiv Detail & Related papers (2020-10-21T16:44:48Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z) - On the Inference Calibration of Neural Machine Translation [54.48932804996506]
We study the correlation between calibration and translation performance and linguistic properties of miscalibration.
We propose a new graduated label smoothing method that can improve both inference calibration and translation performance.
arXiv Detail & Related papers (2020-05-03T02:03:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.