Exploring Document-Level Literary Machine Translation with Parallel
Paragraphs from World Literature
- URL: http://arxiv.org/abs/2210.14250v1
- Date: Tue, 25 Oct 2022 18:03:34 GMT
- Title: Exploring Document-Level Literary Machine Translation with Parallel
Paragraphs from World Literature
- Authors: Katherine Thai and Marzena Karpinska and Kalpesh Krishna and Bill Ray
and Moira Inghilleri and John Wieting and Mohit Iyyer
- Abstract summary: We show that literary translators prefer reference human translations over machine-translated paragraphs at a rate of 84%.
We train a post-editing model whose output is preferred over normal MT output at a rate of 69% by experts.
- Score: 35.1398797683712
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Literary translation is a culturally significant task, but it is bottlenecked
by the small number of qualified literary translators relative to the many
untranslated works published around the world. Machine translation (MT) holds
potential to complement the work of human translators by improving both
training procedures and their overall efficiency. Literary translation is less
constrained than more traditional MT settings since translators must balance
meaning equivalence, readability, and critical interpretability in the target
language. This property, along with the complex discourse-level context present
in literary texts, also makes literary MT more challenging to computationally
model and evaluate. To explore this task, we collect a dataset (Par3) of
non-English language novels in the public domain, each aligned at the paragraph
level to both human and automatic English translations. Using Par3, we discover
that expert literary translators prefer reference human translations over
machine-translated paragraphs at a rate of 84%, while state-of-the-art
automatic MT metrics do not correlate with those preferences. The experts note
that MT outputs contain not only mistranslations, but also discourse-disrupting
errors and stylistic inconsistencies. To address these problems, we train a
post-editing model whose output is preferred over normal MT output at a rate of
69% by experts. We publicly release Par3 at
https://github.com/katherinethai/par3/ to spur future research into literary
MT.
Related papers
- (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts [52.18246881218829]
We introduce a novel multi-agent framework based on large language models (LLMs) for literary translation, implemented as a company called TransAgents.
To evaluate the effectiveness of our system, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP)
arXiv Detail & Related papers (2024-05-20T05:55:08Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Do GPTs Produce Less Literal Translations? [20.095646048167612]
Large Language Models (LLMs) have emerged as general-purpose language models capable of addressing many natural language generation or understanding tasks.
We find that translations out of English (E-X) from GPTs tend to be less literal, while exhibiting similar or better scores on Machine Translation quality metrics.
arXiv Detail & Related papers (2023-05-26T10:38:31Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - Large language models effectively leverage document-level context for
literary translation, but critical errors persist [32.54546652197316]
Large language models (LLMs) are competitive with the state of the art on a wide range of sentence-level translation datasets.
We show through a rigorous human evaluation that asking the Gpt-3.5 (text-davinci-003) LLM to translate an entire literary paragraph results in higher-quality translations.
arXiv Detail & Related papers (2023-04-06T17:27:45Z) - A Bilingual Parallel Corpus with Discourse Annotations [82.07304301996562]
This paper describes BWB, a large parallel corpus first introduced in Jiang et al. (2022), along with an annotated test set.
The BWB corpus consists of Chinese novels translated by experts into English, and the annotated test set is designed to probe the ability of machine translation systems to model various discourse phenomena.
arXiv Detail & Related papers (2022-10-26T12:33:53Z) - Creativity in translation: machine translation as a constraint for
literary texts [3.3453601632404073]
This article presents the results of a study involving the translation of a short story by Kurt Vonnegut from English to Catalan and Dutch using three modalities: machine-translation (MT), post-editing (PE) and translation without aid (HT)
A neural MT system trained on literary data does not currently have the necessary capabilities for a creative translation; it renders literal solutions to translation problems.
More importantly, using MT to post-edit raw output constrains the creativity of translators, resulting in a poorer translation often not fit for publication, according to experts.
arXiv Detail & Related papers (2022-04-12T09:27:00Z) - It is Not as Good as You Think! Evaluating Simultaneous Machine
Translation on Interpretation Data [58.105938143865906]
We argue that SiMT systems should be trained and tested on real interpretation data.
Our results highlight the difference of up-to 13.83 BLEU score when SiMT models are evaluated on translation vs interpretation data.
arXiv Detail & Related papers (2021-10-11T12:27:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.