Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting
- URL: http://arxiv.org/abs/2310.07081v2
- Date: Fri, 20 Oct 2023 19:24:14 GMT
- Title: Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting
- Authors: Emmy Liu, Aditi Chaudhary, Graham Neubig
- Abstract summary: We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
- Score: 66.02718577386426
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Idioms are common in everyday language, but often pose a challenge to
translators because their meanings do not follow from the meanings of their
parts. Despite significant advances, machine translation systems still struggle
to translate idiomatic expressions. We provide a simple characterization of
idiomatic translation and related issues. This allows us to conduct a synthetic
experiment revealing a tipping point at which transformer-based machine
translation models correctly default to idiomatic translations. To expand
multilingual resources, we compile a dataset of ~4k natural sentences
containing idiomatic expressions in French, Finnish, and Japanese. To improve
translation of natural idioms, we introduce two straightforward yet effective
techniques: the strategic upweighting of training loss on potentially idiomatic
sentences, and using retrieval-augmented models. This not only improves the
accuracy of a strong pretrained MT model on idiomatic sentences by up to 13% in
absolute accuracy, but also holds potential benefits for non-idiomatic
sentences.
Related papers
- The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - Can Transformer be Too Compositional? Analysing Idiom Processing in
Neural Machine Translation [55.52888815590317]
Unlike literal expressions, idioms' meanings do not directly follow from their parts.
NMT models are often unable to translate idioms accurately and over-generate compositional, literal translations.
We investigate whether the non-compositionality of idioms is reflected in the mechanics of the dominant NMT model, Transformer.
arXiv Detail & Related papers (2022-05-30T17:59:32Z) - Towards Debiasing Translation Artifacts [15.991970288297443]
We propose a novel approach to reducing translationese by extending an established bias-removal technique.
We use the Iterative Null-space Projection (INLP) algorithm, and show by measuring classification accuracy before and after debiasing, that translationese is reduced at both sentence and word level.
To the best of our knowledge, this is the first study to debias translationese as represented in latent embedding space.
arXiv Detail & Related papers (2022-05-16T21:46:51Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Modelling Latent Translations for Cross-Lingual Transfer [47.61502999819699]
We propose a new technique that integrates both steps of the traditional pipeline (translation and classification) into a single model.
We evaluate our novel latent translation-based model on a series of multilingual NLU tasks.
We report gains for both zero-shot and few-shot learning setups, up to 2.7 accuracy points on average.
arXiv Detail & Related papers (2021-07-23T17:11:27Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z) - Incorporating Bilingual Dictionaries for Low Resource Semi-Supervised
Neural Machine Translation [5.958653653305609]
We incorporate widely available bilingual dictionaries that yield word-by-word translations to generate synthetic sentences.
This automatically expands the vocabulary of the model while maintaining high quality content.
arXiv Detail & Related papers (2020-04-05T02:14:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.