Related papers: SALTED: A Framework for SAlient Long-Tail Translation Error Detection

SALTED: A Framework for SAlient Long-Tail Translation Error Detection

URL: http://arxiv.org/abs/2205.09988v1
Date: Fri, 20 May 2022 06:45:07 GMT
Title: SALTED: A Framework for SAlient Long-Tail Translation Error Detection
Authors: Vikas Raunak, Matt Post, Arul Menezes
Abstract summary: We introduce SALTED, a specifications-based framework for behavioral testing of machine translation models. At the core of our approach is the development of high-precision detectors that flag errors between a source sentence and a system output. We demonstrate that such detectors could be used not just to identify salient long-tail errors in MT systems, but also for higher-recall filtering of the training data.
Score: 17.914521288548844
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Traditional machine translation (MT) metrics provide an average measure of translation quality that is insensitive to the long tail of behavioral problems in MT. Examples include translation of numbers, physical units, dropped content and hallucinations. These errors, which occur rarely and unpredictably in Neural Machine Translation (NMT), greatly undermine the reliability of state-of-the-art MT systems. Consequently, it is important to have visibility into these problems during model development. Towards this direction, we introduce SALTED, a specifications-based framework for behavioral testing of MT models that provides fine-grained views of salient long-tail errors, permitting trustworthy visibility into previously invisible problems. At the core of our approach is the development of high-precision detectors that flag errors (or alternatively, verify output correctness) between a source sentence and a system output. We demonstrate that such detectors could be used not just to identify salient long-tail errors in MT systems, but also for higher-recall filtering of the training data, fixing targeted errors with model fine-tuning in NMT and generating novel data for metamorphic testing to elicit further bugs in models.

Related papers

Cyber Risks of Machine Translation Critical Errors : Arabic Mental Health Tweets as a Case Study [3.8779763612314637]
We introduce an authentic dataset of machine translation critical errors to point to the ethical and safety issues involved in the common use of MT. The dataset comprises mistranslations of Arabic mental health postings manually annotated with critical error types. We also show how the commonly used quality metrics do not penalise critical errors and highlight this as a critical issue that merits further attention from researchers.
arXiv Detail & Related papers (2024-05-19T20:24:51Z)
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations. We study the impact of labeled data through in-context learning and finetuning. We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z)
Perturbation-based QE: An Explainable, Unsupervised Word-level Quality Estimation Method for Blackbox Machine Translation [12.376309678270275]
Perturbation-based QE works simply by analyzing MT system output on perturbed input source sentences. Our approach is better at detecting gender bias and word-sense-disambiguation errors in translation than supervised QE.
arXiv Detail & Related papers (2023-05-12T13:10:57Z)
Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation [1.7616042687330642]
A key step to validate the robustness of the NMT models is to evaluate their performance on adversarial inputs. In this work, we identify a set of perturbations and metrics tailored for the robustness assessment of such models. We present a preliminary experimental evaluation, showing what type of perturbations affect the model the most.
arXiv Detail & Related papers (2022-03-29T08:01:39Z)
When Does Translation Require Context? A Data-driven, Multilingual Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT) Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation. We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z)
Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT) Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z)
Detecting Hallucinated Content in Conditional Neural Sequence Generation [165.68948078624499]
We propose a task to predict whether each token in the output sequence is hallucinated (not contained in the input) We also introduce a method for learning to detect hallucinations using pretrained language models fine tuned on synthetic data.
arXiv Detail & Related papers (2020-11-05T00:18:53Z)
Sentence Boundary Augmentation For Neural Machine Translation Robustness [11.290581889247983]
We show that sentence boundary segmentation has the largest impact on quality, and we develop a simple data augmentation strategy to improve segmentation robustness. We show that sentence boundary segmentation has the largest impact on quality, and we develop a simple data augmentation strategy to improve segmentation robustness.
arXiv Detail & Related papers (2020-10-21T16:44:48Z)
It's Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty. XMI exploits the probabilistic nature of most neural machine translation models. We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z)
On the Inference Calibration of Neural Machine Translation [54.48932804996506]
We study the correlation between calibration and translation performance and linguistic properties of miscalibration. We propose a new graduated label smoothing method that can improve both inference calibration and translation performance.
arXiv Detail & Related papers (2020-05-03T02:03:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.