Large language models effectively leverage document-level context for
literary translation, but critical errors persist
- URL: http://arxiv.org/abs/2304.03245v3
- Date: Mon, 22 May 2023 22:00:51 GMT
- Title: Large language models effectively leverage document-level context for
literary translation, but critical errors persist
- Authors: Marzena Karpinska and Mohit Iyyer
- Abstract summary: Large language models (LLMs) are competitive with the state of the art on a wide range of sentence-level translation datasets.
We show through a rigorous human evaluation that asking the Gpt-3.5 (text-davinci-003) LLM to translate an entire literary paragraph results in higher-quality translations.
- Score: 32.54546652197316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are competitive with the state of the art on a
wide range of sentence-level translation datasets. However, their ability to
translate paragraphs and documents remains unexplored because evaluation in
these settings is costly and difficult. We show through a rigorous human
evaluation that asking the Gpt-3.5 (text-davinci-003) LLM to translate an
entire literary paragraph (e.g., from a novel) at once results in
higher-quality translations than standard sentence-by-sentence translation
across 18 linguistically-diverse language pairs (e.g., translating into and out
of Japanese, Polish, and English). Our evaluation, which took approximately 350
hours of effort for annotation and analysis, is conducted by hiring translators
fluent in both the source and target language and asking them to provide both
span-level error annotations as well as preference judgments of which system's
translations are better. We observe that discourse-level LLM translators commit
fewer mistranslations, grammar errors, and stylistic inconsistencies than
sentence-level approaches. With that said, critical errors still abound,
including occasional content omissions, and a human translator's intervention
remains necessary to ensure that the author's voice remains intact. We publicly
release our dataset and error annotations to spur future research on evaluation
of document-level literary translation.
Related papers
- A Data Selection Approach for Enhancing Low Resource Machine Translation Using Cross-Lingual Sentence Representations [0.4499833362998489]
This study focuses on the case of English-Marathi language pairs, where existing datasets are notably noisy.
To mitigate the impact of data quality issues, we propose a data filtering approach based on cross-lingual sentence representations.
Results demonstrate a significant improvement in translation quality over the baseline post-filtering with IndicSBERT.
arXiv Detail & Related papers (2024-09-04T13:49:45Z) - GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation Quality Across Languages, Domains, and Expertise Levels [18.835573312027265]
This study comprehensively evaluates the translation quality of Large Language Models (LLMs) against human translators.
We find that GPT-4 performs comparably to junior translators in terms of total errors made but lags behind medium and senior translators.
arXiv Detail & Related papers (2024-07-04T05:58:04Z) - (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts [52.18246881218829]
We introduce a novel multi-agent framework based on large language models (LLMs) for literary translation, implemented as a company called TransAgents.
To evaluate the effectiveness of our system, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP)
arXiv Detail & Related papers (2024-05-20T05:55:08Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - Translation Errors Significantly Impact Low-Resource Languages in
Cross-Lingual Learning [26.49647954587193]
In this work, we find that translation inconsistencies do exist and they disproportionally impact low-resource languages in XNLI.
To identify such inconsistencies, we propose measuring the gap in performance between zero-shot evaluations on the human-translated and machine-translated target text.
We also corroborate that translation errors exist for two target languages, namely Hindi and Urdu, by doing a manual reannotation of human-translated test instances.
arXiv Detail & Related papers (2024-02-03T08:22:51Z) - Enhancing Document-level Translation of Large Language Model via
Translation Mixed-instructions [24.025242477280983]
Existing large language models (LLMs) for machine translation are typically fine-tuned on sentence-level translation instructions.
This challenge arises from the issue of sentence-level coverage, where subsequent sentences in the document remain untranslated.
We propose an approach that combines sentence-level and document-level translation instructions of varying lengths to fine-tune LLMs.
arXiv Detail & Related papers (2024-01-16T03:28:26Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - Understanding Translationese in Cross-Lingual Summarization [106.69566000567598]
Cross-lingual summarization (MS) aims at generating a concise summary in a different target language.
To collect large-scale CLS data, existing datasets typically involve translation in their creation.
In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese.
arXiv Detail & Related papers (2022-12-14T13:41:49Z) - Exploring Document-Level Literary Machine Translation with Parallel
Paragraphs from World Literature [35.1398797683712]
We show that literary translators prefer reference human translations over machine-translated paragraphs at a rate of 84%.
We train a post-editing model whose output is preferred over normal MT output at a rate of 69% by experts.
arXiv Detail & Related papers (2022-10-25T18:03:34Z) - Rethink about the Word-level Quality Estimation for Machine Translation
from Human Judgement [57.72846454929923]
We create a benchmark dataset, emphHJQE, where the expert translators directly annotate poorly translated words.
We propose two tag correcting strategies, namely tag refinement strategy and tree-based annotation strategy, to make the TER-based artificial QE corpus closer to emphHJQE.
The results show our proposed dataset is more consistent with human judgement and also confirm the effectiveness of the proposed tag correcting strategies.
arXiv Detail & Related papers (2022-09-13T02:37:12Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.