Predicting Human Translation Difficulty with Neural Machine Translation
- URL: http://arxiv.org/abs/2312.11852v1
- Date: Tue, 19 Dec 2023 04:42:56 GMT
- Title: Predicting Human Translation Difficulty with Neural Machine Translation
- Authors: Zheng Wei Lim, Ekaterina Vylomova, Charles Kemp, and Trevor Cohn
- Abstract summary: We evaluate the extent to which surprisal and attentional features derived from a Neural Machine Translation (NMT) model account for reading and production times of human translators.
We find that surprisal and attention are complementary predictors of translation difficulty, and that surprisal derived from a NMT model is the single most successful predictor of production duration.
- Score: 30.036747251603668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human translators linger on some words and phrases more than others, and
predicting this variation is a step towards explaining the underlying cognitive
processes. Using data from the CRITT Translation Process Research Database, we
evaluate the extent to which surprisal and attentional features derived from a
Neural Machine Translation (NMT) model account for reading and production times
of human translators. We find that surprisal and attention are complementary
predictors of translation difficulty, and that surprisal derived from a NMT
model is the single most successful predictor of production duration. Our
analyses draw on data from hundreds of translators operating across 13 language
pairs, and represent the most comprehensive investigation of human translation
difficulty to date.
Related papers
- Understanding and Addressing the Under-Translation Problem from the Perspective of Decoding Objective [72.83966378613238]
Under-translation and over-translation remain two challenging problems in state-of-the-art Neural Machine Translation (NMT) systems.
We conduct an in-depth analysis on the underlying cause of under-translation in NMT, providing an explanation from the perspective of decoding objective.
We propose employing the confidence of predicting End Of Sentence (EOS) as a detector for under-translation, and strengthening the confidence-based penalty to penalize candidates with a high risk of under-translation.
arXiv Detail & Related papers (2024-05-29T09:25:49Z) - An Empirical Study on the Robustness of Massively Multilingual Neural Machine Translation [40.08063412966712]
Massively multilingual neural machine translation (MMNMT) has been proven to enhance the translation quality of low-resource languages.
We create a robustness evaluation benchmark dataset for Indonesian-Chinese translation.
This dataset is automatically translated into Chinese using four NLLB-200 models of different sizes.
arXiv Detail & Related papers (2024-05-13T12:01:54Z) - An Empirical study of Unsupervised Neural Machine Translation: analyzing
NMT output, model's behavior and sentences' contribution [5.691028372215281]
Unsupervised Neural Machine Translation (UNMT) focuses on improving NMT results under the assumption there is no human translated parallel data.
We focus on three very diverse languages, French, Gujarati, and Kazakh, and train bilingual NMT models, to and from English, with various levels of supervision.
arXiv Detail & Related papers (2023-12-19T20:35:08Z) - Is Robustness Transferable across Languages in Multilingual Neural
Machine Translation? [45.04661608619081]
We investigate the transferability of robustness across different languages in multilingual neural machine translation.
Our findings demonstrate that the robustness gained in one translation direction can indeed transfer to other translation directions.
arXiv Detail & Related papers (2023-10-31T04:10:31Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - Competency-Aware Neural Machine Translation: Can Machine Translation
Know its Own Translation Quality? [61.866103154161884]
Neural machine translation (NMT) is often criticized for failures that happen without awareness.
We propose a novel competency-aware NMT by extending conventional NMT with a self-estimator.
We show that the proposed method delivers outstanding performance on quality estimation.
arXiv Detail & Related papers (2022-11-25T02:39:41Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z) - Towards Multimodal Simultaneous Neural Machine Translation [28.536262015508722]
Simultaneous translation involves translating a sentence before the speaker's utterance is completed in order to realize real-time understanding.
This task is significantly more challenging than the general full sentence translation because of the shortage of input information during decoding.
We propose multimodal simultaneous neural machine translation (MSNMT), which leverages visual information as an additional modality.
arXiv Detail & Related papers (2020-04-07T08:02:21Z) - On the Integration of LinguisticFeatures into Statistical and Neural
Machine Translation [2.132096006921048]
We investigate the discrepancies between the strengths of statistical approaches to machine translation and the way humans translate.
We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations.
We identify overgeneralization or 'algomic bias' as a potential drawback of neural MT and link it to many of the remaining linguistic issues.
arXiv Detail & Related papers (2020-03-31T16:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.