Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts
- URL: http://arxiv.org/abs/2409.06790v1
- Date: Tue, 10 Sep 2024 18:02:21 GMT
- Title: Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts
- Authors: Eleftheria Briakou, Jiaming Luo, Colin Cherry, Markus Freitag,
- Abstract summary: We propose a framework that engages language models in a multi-turn interaction, encompassing pre-translation research, drafting, refining, and proofreading.
We show that translating step-by-step yields large translation quality improvements over conventional zero-shot prompting approaches.
- Score: 43.68711076100652
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we present a step-by-step approach to long-form text translation, drawing on established processes in translation studies. Instead of viewing machine translation as a single, monolithic task, we propose a framework that engages language models in a multi-turn interaction, encompassing pre-translation research, drafting, refining, and proofreading, resulting in progressively improved translations. Extensive automatic evaluations using Gemini 1.5 Pro across ten language pairs show that translating step-by-step yields large translation quality improvements over conventional zero-shot prompting approaches and earlier human-like baseline strategies, resulting in state-of-the-art results on WMT2024.
Related papers
- Towards Chapter-to-Chapter Context-Aware Literary Translation via Large Language Models [16.96647110733261]
discourse phenomena in existing document-level translation datasets are sparse.
Most existing document-level corpora and context-aware machine translation methods rely on an unrealistic assumption on sentence-level alignments.
We propose a more pragmatic and challenging setting for context-aware translation, termed chapter-to-chapter (Ch2Ch) translation.
arXiv Detail & Related papers (2024-07-12T04:18:22Z) - (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts [52.18246881218829]
We introduce a novel multi-agent framework based on large language models (LLMs) for literary translation, implemented as a company called TransAgents.
To evaluate the effectiveness of our system, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP)
arXiv Detail & Related papers (2024-05-20T05:55:08Z) - BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine
Translation [4.651581292181871]
We propose a bidirectional semantic-based evaluation method designed to assess the sense distance of the translation from the source text.
This approach employs the comprehensive multilingual encyclopedic dictionary BabelNet.
Factual analysis shows a strong correlation between the average evaluation scores generated by our method and the human assessments across various machine translation systems for English-German language pair.
arXiv Detail & Related papers (2024-03-06T08:02:21Z) - Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing [12.843274390224853]
Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks.
We show that they have yet to attain state-of-the-art performance in Neural Machine Translation.
We propose adapting LLM's as Automatic Post-Editors (APE) rather than direct translators.
arXiv Detail & Related papers (2023-10-23T12:22:15Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - Meta Back-translation [111.87397401837286]
We propose a novel method to generate pseudo-parallel data from a pre-trained back-translation model.
Our method is a meta-learning algorithm which adapts a pre-trained back-translation model so that the pseudo-parallel data it generates would train a forward-translation model to do well on a validation set.
arXiv Detail & Related papers (2021-02-15T20:58:32Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z) - Cross-lingual Retrieval for Iterative Self-Supervised Training [66.3329263451598]
Cross-lingual alignment can be further improved by training seq2seq models on sentence pairs mined using their own encoder outputs.
We develop a new approach -- cross-lingual retrieval for iterative self-supervised training.
arXiv Detail & Related papers (2020-06-16T21:30:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.