Rethinking Round-Trip Translation for Machine Translation Evaluation
- URL: http://arxiv.org/abs/2209.07351v3
- Date: Mon, 15 May 2023 11:33:20 GMT
- Title: Rethinking Round-Trip Translation for Machine Translation Evaluation
- Authors: Terry Yue Zhuo, Qiongkai Xu, Xuanli He, Trevor Cohn
- Abstract summary: We report the surprising finding that round-trip translation can be used for automatic evaluation without the references.
We demonstrate the rectification is overdue as round-trip translation could benefit multiple machine translation evaluation tasks.
- Score: 44.83568796515321
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Automatic evaluation on low-resource language translation suffers from a
deficiency of parallel corpora. Round-trip translation could be served as a
clever and straightforward technique to alleviate the requirement of the
parallel evaluation corpus. However, there was an observation of obscure
correlations between the evaluation scores by forward and round-trip
translations in the era of statistical machine translation (SMT). In this
paper, we report the surprising finding that round-trip translation can be used
for automatic evaluation without the references. Firstly, our revisit on the
round-trip translation in SMT evaluation unveils that its long-standing
misunderstanding is essentially caused by copying mechanism. After removing
copying mechanism in SMT, round-trip translation scores can appropriately
reflect the forward translation performance. Then, we demonstrate the
rectification is overdue as round-trip translation could benefit multiple
machine translation evaluation tasks. To be more specific, round-trip
translation could be used i) to predict corresponding forward translation
scores; ii) to improve the performance of the recently advanced quality
estimation model; and iii) to identify adversarial competitors in shared tasks
via cross-system verification.
Related papers
- Understanding and Addressing the Under-Translation Problem from the Perspective of Decoding Objective [72.83966378613238]
Under-translation and over-translation remain two challenging problems in state-of-the-art Neural Machine Translation (NMT) systems.
We conduct an in-depth analysis on the underlying cause of under-translation in NMT, providing an explanation from the perspective of decoding objective.
We propose employing the confidence of predicting End Of Sentence (EOS) as a detector for under-translation, and strengthening the confidence-based penalty to penalize candidates with a high risk of under-translation.
arXiv Detail & Related papers (2024-05-29T09:25:49Z) - BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine
Translation [4.651581292181871]
We propose a bidirectional semantic-based evaluation method designed to assess the sense distance of the translation from the source text.
This approach employs the comprehensive multilingual encyclopedic dictionary BabelNet.
Factual analysis shows a strong correlation between the average evaluation scores generated by our method and the human assessments across various machine translation systems for English-German language pair.
arXiv Detail & Related papers (2024-03-06T08:02:21Z) - Extrinsic Evaluation of Machine Translation Metrics [78.75776477562087]
It is unclear if automatic metrics are reliable at distinguishing good translations from bad translations at the sentence level.
We evaluate the segment-level performance of the most widely used MT metrics (chrF, COMET, BERTScore, etc.) on three downstream cross-lingual tasks.
Our experiments demonstrate that all metrics exhibit negligible correlation with the extrinsic evaluation of the downstream outcomes.
arXiv Detail & Related papers (2022-12-20T14:39:58Z) - Principled Paraphrase Generation with Parallel Corpora [52.78059089341062]
We formalize the implicit similarity function induced by round-trip Machine Translation.
We show that it is susceptible to non-paraphrase pairs sharing a single ambiguous translation.
We design an alternative similarity metric that mitigates this issue.
arXiv Detail & Related papers (2022-05-24T17:22:42Z) - A Bayesian approach to translators' reliability assessment [0.0]
We consider the Translation Quality Assessment process as a complex process, considering it from the physics of complex systems point of view.
We build two Bayesian models that parameterise the features involved in the TQA process, namely the translation difficulty, the characteristics of the translators involved in producing the translation and assessing its quality.
We show that reviewers reliability cannot be taken for granted even if they are expert translators.
arXiv Detail & Related papers (2022-03-14T14:29:45Z) - It is Not as Good as You Think! Evaluating Simultaneous Machine
Translation on Interpretation Data [58.105938143865906]
We argue that SiMT systems should be trained and tested on real interpretation data.
Our results highlight the difference of up-to 13.83 BLEU score when SiMT models are evaluated on translation vs interpretation data.
arXiv Detail & Related papers (2021-10-11T12:27:07Z) - The Impact of Indirect Machine Translation on Sentiment Classification [6.719549885077474]
We propose employing a machine translation (MT) system to translate customer feedback into another language.
As performing a direct translation is not always possible, we explore the performance of automatic classifiers on sentences that have been translated.
We conduct several experiments to analyse the performance of our proposed sentiment classification system and discuss the advantages and drawbacks of classifying translated sentences.
arXiv Detail & Related papers (2020-08-25T20:30:21Z) - On the Limitations of Cross-lingual Encoders as Exposed by
Reference-Free Machine Translation Evaluation [55.02832094101173]
Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual similarity.
This paper concerns ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations.
We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.
We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations.
arXiv Detail & Related papers (2020-05-03T22:10:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.