Machine Translation Models are Zero-Shot Detectors of Translation Direction
- URL: http://arxiv.org/abs/2401.06769v2
- Date: Wed, 22 May 2024 17:10:39 GMT
- Title: Machine Translation Models are Zero-Shot Detectors of Translation Direction
- Authors: Michelle Wastl, Jannis Vamvas, Rico Sennrich,
- Abstract summary: Detecting the translation direction of parallel text has applications for machine translation training and evaluation, but also has forensic applications such as resolving plagiarism or forgery allegations.
In this work, we explore an unsupervised approach to translation direction detection based on the simple hypothesis that $p(texttranslation|textoriginal)>p(textoriginal|texttranslation)$, motivated by the well-known simplification effect in translationese or machine-translationese.
- Score: 46.41883195574249
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting the translation direction of parallel text has applications for machine translation training and evaluation, but also has forensic applications such as resolving plagiarism or forgery allegations. In this work, we explore an unsupervised approach to translation direction detection based on the simple hypothesis that $p(\text{translation}|\text{original})>p(\text{original}|\text{translation})$, motivated by the well-known simplification effect in translationese or machine-translationese. In experiments with massively multilingual machine translation models across 20 translation directions, we confirm the effectiveness of the approach for high-resource language pairs, achieving document-level accuracies of 82--96% for NMT-produced translations, and 60--81% for human translations, depending on the model used. Code and demo are available at https://github.com/ZurichNLP/translation-direction-detection
Related papers
- Prediction of Translation Techniques for the Translation Process [6.30737834823321]
The study differentiates between two scenarios of the translation process: from-scratch translation and post-editing.
The findings indicate that the predictive accuracy for from-scratch translation reaches 82%, while the post-editing process exhibits even greater potential, achieving an accuracy rate of 93%.
arXiv Detail & Related papers (2024-03-21T15:02:03Z) - Mitigating Hallucinations and Off-target Machine Translation with
Source-Contrastive and Language-Contrastive Decoding [53.84948040596055]
We introduce two related methods to mitigate failure cases with a modified decoding objective.
Experiments on the massively multilingual models M2M-100 (418M) and SMaLL-100 show that these methods suppress hallucinations and off-target translations.
arXiv Detail & Related papers (2023-09-13T17:15:27Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - Towards Debiasing Translation Artifacts [15.991970288297443]
We propose a novel approach to reducing translationese by extending an established bias-removal technique.
We use the Iterative Null-space Projection (INLP) algorithm, and show by measuring classification accuracy before and after debiasing, that translationese is reduced at both sentence and word level.
To the best of our knowledge, this is the first study to debias translationese as represented in latent embedding space.
arXiv Detail & Related papers (2022-05-16T21:46:51Z) - As Little as Possible, as Much as Necessary: Detecting Over- and
Undertranslations with Contrastive Conditioning [42.46681912294797]
We propose a method for detecting superfluous words in neural machine translation.
We compare the likelihood of a full sequence under a translation model to the likelihood of its parts, given the corresponding source or target sequence.
This allows to pinpoint superfluous words in the translation and untranslated words in the source even in the absence of a reference translation.
arXiv Detail & Related papers (2022-03-03T18:59:02Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Detecting over/under-translation errors for determining adequacy in
human translations [0.0]
We present a novel approach to detecting over and under translations (OT/UT) as part of adequacy error checks in translation evaluation.
We do not restrict ourselves to machine translation (MT) outputs and specifically target applications with human generated translation pipeline.
The goal of our system is to identify OT/UT errors from human translated video subtitles with high error recall.
arXiv Detail & Related papers (2021-04-01T06:06:36Z) - Simplify-then-Translate: Automatic Preprocessing for Black-Box Machine
Translation [5.480070710278571]
We introduce a method to improve black-box machine translation systems via automatic pre-processing (APP) using sentence simplification.
We first propose a method to automatically generate a large in-domain paraphrase corpus through back-translation with a black-box MT system.
We show that this preprocessing leads to better translation performance as compared to non-preprocessed source sentences.
arXiv Detail & Related papers (2020-05-22T14:15:53Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.