Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation
- URL: http://arxiv.org/abs/2408.17308v1
- Date: Fri, 30 Aug 2024 14:12:04 GMT
- Title: Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation
- Authors: Esther Ploeger, Huiyuan Lai, Rik van Noord, Antonio Toral,
- Abstract summary: Machine translations are found to be lexically poorer than human translations.
We propose a novel approach that consists of reranking translation candidates with a classifier that distinguishes between original and translated text.
We evaluate our approach on 31 English-to-Dutch book translations, and find that, for certain books, our approach retrieves lexical diversity scores that are close to human translation.
- Score: 11.875491080062233
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Machine translations are found to be lexically poorer than human translations. The loss of lexical diversity through MT poses an issue in the automatic translation of literature, where it matters not only what is written, but also how it is written. Current methods for increasing lexical diversity in MT are rigid. Yet, as we demonstrate, the degree of lexical diversity can vary considerably across different novels. Thus, rather than aiming for the rigid increase of lexical diversity, we reframe the task as recovering what is lost in the machine translation process. We propose a novel approach that consists of reranking translation candidates with a classifier that distinguishes between original and translated text. We evaluate our approach on 31 English-to-Dutch book translations, and find that, for certain books, our approach retrieves lexical diversity scores that are close to human translation.
Related papers
- (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts [52.18246881218829]
We introduce a novel multi-agent framework based on large language models (LLMs) for literary translation, implemented as a company called TransAgents.
To evaluate the effectiveness of our system, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP)
arXiv Detail & Related papers (2024-05-20T05:55:08Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - Lost in Translationese? Reducing Translation Effect Using Abstract
Meaning Representation [11.358350306918027]
We argue that Abstract Meaning Representation (AMR) can be used as an interlingua to reduce the amount of translationese in translated texts.
By parsing English translations into an AMR and then generating text from that AMR, the result more closely resembles originally English text.
This work makes strides towards reducing translationese in text and highlights the utility of AMR as an interlingua.
arXiv Detail & Related papers (2023-04-23T00:04:14Z) - Exploring Document-Level Literary Machine Translation with Parallel
Paragraphs from World Literature [35.1398797683712]
We show that literary translators prefer reference human translations over machine-translated paragraphs at a rate of 84%.
We train a post-editing model whose output is preferred over normal MT output at a rate of 69% by experts.
arXiv Detail & Related papers (2022-10-25T18:03:34Z) - Exploring Diversity in Back Translation for Low-Resource Machine
Translation [85.03257601325183]
Back translation is one of the most widely used methods for improving the performance of neural machine translation systems.
Recent research has sought to enhance the effectiveness of this method by increasing the 'diversity' of the generated translations.
This work puts forward a more nuanced framework for understanding diversity in training data, splitting it into lexical diversity and syntactic diversity.
arXiv Detail & Related papers (2022-06-01T15:21:16Z) - Quantitative Evaluation of Alternative Translations in a Corpus of
Highly Dissimilar Finnish Paraphrases [1.8748036062767652]
We present a quantitative evaluation of differences between alternative translations in a large recently released Finnish paraphrase corpus.
We combine a series of automatic steps detecting systematic variation with manual analysis to reveal regularities and identify categories of translation differences.
arXiv Detail & Related papers (2021-05-06T07:22:16Z) - Decoding and Diversity in Machine Translation [90.33636694717954]
We characterize differences between cost diversity paid for the BLEU scores enjoyed by NMT.
Our study implicates search as a salient source of known bias when translating gender pronouns.
arXiv Detail & Related papers (2020-11-26T21:09:38Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.