Automatic Translation Alignment Pipeline for Multilingual Digital Editions of Literary Works
- URL: http://arxiv.org/abs/2410.13255v1
- Date: Thu, 17 Oct 2024 06:21:38 GMT
- Title: Automatic Translation Alignment Pipeline for Multilingual Digital Editions of Literary Works
- Authors: Maria Levchenko,
- Abstract summary: This paper investigates the application of translation alignment algorithms in the creation of a Multilingual Digital Edition (MDE) of Alessandro Manzoni's Italian novel "I promessi sposi"
We identify key requirements for the MDE to improve both the reader experience and support for translation studies.
We propose new metrics for evaluating the alignment of literary translations and suggest visualization techniques for future analysis.
- Score: 0.0
- License:
- Abstract: This paper investigates the application of translation alignment algorithms in the creation of a Multilingual Digital Edition (MDE) of Alessandro Manzoni's Italian novel "I promessi sposi" ("The Betrothed"), with translations in eight languages (English, Spanish, French, German, Dutch, Polish, Russian and Chinese) from the 19th and 20th centuries. We identify key requirements for the MDE to improve both the reader experience and support for translation studies. Our research highlights the limitations of current state-of-the-art algorithms when applied to the translation of literary texts and outlines an automated pipeline for MDE creation. This pipeline transforms raw texts into web-based, side-by-side representations of original and translated texts with different rendering options. In addition, we propose new metrics for evaluating the alignment of literary translations and suggest visualization techniques for future analysis.
Related papers
- Towards Visual Text Design Transfer Across Languages [49.78504488452978]
We introduce a novel task of Multimodal Style Translation (MuST-Bench)
MuST-Bench is a benchmark designed to evaluate the ability of visual text generation models to perform translation across different writing systems.
In response, we introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions.
arXiv Detail & Related papers (2024-10-24T15:15:01Z) - Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts [43.68711076100652]
We propose a framework that engages language models in a multi-turn interaction, encompassing pre-translation research, drafting, refining, and proofreading.
We show that translating step-by-step yields large translation quality improvements over conventional zero-shot prompting approaches.
arXiv Detail & Related papers (2024-09-10T18:02:21Z) - (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts [52.18246881218829]
We introduce a novel multi-agent framework based on large language models (LLMs) for literary translation, implemented as a company called TransAgents.
To evaluate the effectiveness of our system, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP)
arXiv Detail & Related papers (2024-05-20T05:55:08Z) - Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing [12.843274390224853]
Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks.
We show that they have yet to attain state-of-the-art performance in Neural Machine Translation.
We propose adapting LLM's as Automatic Post-Editors (APE) rather than direct translators.
arXiv Detail & Related papers (2023-10-23T12:22:15Z) - Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation [1.9085074258303771]
We study the task of visually'' translating scene text from a source language to a target language.
Visual translation involves not just the recognition and translation of scene text but also the generation of the translated image.
We present a cascaded framework for visual translation that combines state-of-the-art modules for scene text recognition, machine translation, and scene text synthesis.
arXiv Detail & Related papers (2023-08-06T05:23:25Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - A Bilingual Parallel Corpus with Discourse Annotations [82.07304301996562]
This paper describes BWB, a large parallel corpus first introduced in Jiang et al. (2022), along with an annotated test set.
The BWB corpus consists of Chinese novels translated by experts into English, and the annotated test set is designed to probe the ability of machine translation systems to model various discourse phenomena.
arXiv Detail & Related papers (2022-10-26T12:33:53Z) - Exploring Document-Level Literary Machine Translation with Parallel
Paragraphs from World Literature [35.1398797683712]
We show that literary translators prefer reference human translations over machine-translated paragraphs at a rate of 84%.
We train a post-editing model whose output is preferred over normal MT output at a rate of 69% by experts.
arXiv Detail & Related papers (2022-10-25T18:03:34Z) - DivEMT: Neural Machine Translation Post-Editing Effort Across
Typologically Diverse Languages [5.367993194110256]
DivEMT is the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages.
We assess the impact on translation productivity of two state-of-the-art NMT systems, namely: Google Translate and the open-source multilingual model mBART50.
arXiv Detail & Related papers (2022-05-24T17:22:52Z) - A High-Quality Multilingual Dataset for Structured Documentation
Translation [101.41835967142521]
This paper presents a high-quality multilingual dataset for the documentation domain.
We collect XML-structured parallel text segments from the online documentation for an enterprise software platform.
arXiv Detail & Related papers (2020-06-24T02:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.