Machine Translation Testing via Syntactic Tree Pruning
- URL: http://arxiv.org/abs/2401.00751v1
- Date: Mon, 1 Jan 2024 13:28:46 GMT
- Title: Machine Translation Testing via Syntactic Tree Pruning
- Authors: Quanjun Zhang, Juan Zhai, Chunrong Fang, Jiawei Liu, Weisong Sun,
Haichuan Hu, Qingyu Wang
- Abstract summary: erroneous translations may result in severe consequences, such as financial losses.
It is challenging to test machine translation systems because of the complexity and intractability of the underlying neural models.
- Score: 19.023809217746955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine translation systems have been widely adopted in our daily life,
making life easier and more convenient. Unfortunately, erroneous translations
may result in severe consequences, such as financial losses. This requires to
improve the accuracy and the reliability of machine translation systems.
However, it is challenging to test machine translation systems because of the
complexity and intractability of the underlying neural models. To tackle these
challenges, we propose a novel metamorphic testing approach by syntactic tree
pruning (STP) to validate machine translation systems. Our key insight is that
a pruned sentence should have similar crucial semantics compared with the
original sentence. Specifically, STP (1) proposes a core semantics-preserving
pruning strategy by basic sentence structure and dependency relations on the
level of syntactic tree representation; (2) generates source sentence pairs
based on the metamorphic relation; (3) reports suspicious issues whose
translations break the consistency property by a bag-of-words model. We further
evaluate STP on two state-of-the-art machine translation systems (i.e., Google
Translate and Bing Microsoft Translator) with 1,200 source sentences as inputs.
The results show that STP can accurately find 5,073 unique erroneous
translations in Google Translate and 5,100 unique erroneous translations in
Bing Microsoft Translator (400% more than state-of-the-art techniques), with
64.5% and 65.4% precision, respectively. The reported erroneous translations
vary in types and more than 90% of them cannot be found by state-of-the-art
techniques. There are 9,393 erroneous translations unique to STP, which is
711.9% more than state-of-the-art techniques. Moreover, STP is quite effective
to detect translation errors for the original sentences with a recall reaching
74.0%, improving state-of-the-art techniques by 55.1% on average.
Related papers
- Machine Translation Models are Zero-Shot Detectors of Translation Direction [46.41883195574249]
Detecting the translation direction of parallel text has applications for machine translation training and evaluation, but also has forensic applications such as resolving plagiarism or forgery allegations.
In this work, we explore an unsupervised approach to translation direction detection based on the simple hypothesis that $p(texttranslation|textoriginal)>p(textoriginal|texttranslation)$, motivated by the well-known simplification effect in translationese or machine-translationese.
arXiv Detail & Related papers (2024-01-12T18:59:02Z) - Word Closure-Based Metamorphic Testing for Machine Translation [8.009584342926646]
We propose a word closure-based output comparison method to address the limitations of the existing Machine Translation Systems (MTS) MT methods.
Our method significantly outperforms the existing works in violation identification by improving the precision and recall.
It also helps to increase the F1 score of translation error localization by 35.9%.
arXiv Detail & Related papers (2023-12-19T11:19:40Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Mitigating Hallucinations and Off-target Machine Translation with
Source-Contrastive and Language-Contrastive Decoding [53.84948040596055]
We introduce two related methods to mitigate failure cases with a modified decoding objective.
Experiments on the massively multilingual models M2M-100 (418M) and SMaLL-100 show that these methods suppress hallucinations and off-target translations.
arXiv Detail & Related papers (2023-09-13T17:15:27Z) - Extrinsic Evaluation of Machine Translation Metrics [78.75776477562087]
It is unclear if automatic metrics are reliable at distinguishing good translations from bad translations at the sentence level.
We evaluate the segment-level performance of the most widely used MT metrics (chrF, COMET, BERTScore, etc.) on three downstream cross-lingual tasks.
Our experiments demonstrate that all metrics exhibit negligible correlation with the extrinsic evaluation of the downstream outcomes.
arXiv Detail & Related papers (2022-12-20T14:39:58Z) - Rethink about the Word-level Quality Estimation for Machine Translation
from Human Judgement [57.72846454929923]
We create a benchmark dataset, emphHJQE, where the expert translators directly annotate poorly translated words.
We propose two tag correcting strategies, namely tag refinement strategy and tree-based annotation strategy, to make the TER-based artificial QE corpus closer to emphHJQE.
The results show our proposed dataset is more consistent with human judgement and also confirm the effectiveness of the proposed tag correcting strategies.
arXiv Detail & Related papers (2022-09-13T02:37:12Z) - Modelling Latent Translations for Cross-Lingual Transfer [47.61502999819699]
We propose a new technique that integrates both steps of the traditional pipeline (translation and classification) into a single model.
We evaluate our novel latent translation-based model on a series of multilingual NLU tasks.
We report gains for both zero-shot and few-shot learning setups, up to 2.7 accuracy points on average.
arXiv Detail & Related papers (2021-07-23T17:11:27Z) - SemMT: A Semantic-based Testing Approach for Machine Translation Systems [11.166336490280749]
We propose SemMT, an automatic testing approach for machine translation systems based on semantic similarity checking.
SemMT applies round-trip translation and measures the semantic similarity between the original and translated sentences.
We show SemMT can achieve higher effectiveness compared with state-of-the-art works.
arXiv Detail & Related papers (2020-12-03T10:42:56Z) - Testing Machine Translation via Referential Transparency [28.931196266344926]
We introduce referentially transparent inputs (RTIs), a simple, widely applicable methodology for validating machine translation software.
Our practical implementation, Purity, detects when this property is broken by a translation.
To evaluate RTI, we use Purity to test Google Translate and Bing Microsoft Translator with 200 unlabeled sentences.
arXiv Detail & Related papers (2020-04-22T01:37:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.