Characterizing the Effects of Translation on Intertextuality using Multilingual Embedding Spaces
- URL: http://arxiv.org/abs/2501.10731v1
- Date: Sat, 18 Jan 2025 11:36:17 GMT
- Title: Characterizing the Effects of Translation on Intertextuality using Multilingual Embedding Spaces
- Authors: Hope McGovern, Hale Sirin, Tom Lippincott,
- Abstract summary: Rhetorical devices are difficult to translate, but they are crucial to the translation of literary documents.
We investigate the use of multilingual embedding spaces to characterize the preservation of intertextuality across human and machine translation.
- Score: 0.0
- License:
- Abstract: Rhetorical devices are difficult to translate, but they are crucial to the translation of literary documents. We investigate the use of multilingual embedding spaces to characterize the preservation of intertextuality, one common rhetorical device, across human and machine translation. To do so, we use Biblical texts, which are both full of intertextual references and are highly translated works. We provide a metric to characterize intertextuality at the corpus level and provide a quantitative analysis of the preservation of this rhetorical device across extant human translations and machine-generated counterparts. We go on to provide qualitative analysis of cases wherein human translations over- or underemphasize the intertextuality present in the text, whereas machine translations provide a neutral baseline. This provides support for established scholarship proposing that human translators have a propensity to amplify certain literary characteristics of the original manuscripts.
Related papers
- Multi-perspective Alignment for Increasing Naturalness in Neural Machine Translation [11.875491080062233]
Neural machine translation (NMT) systems amplify lexical biases present in their training data, leading to artificially impoverished language in output translations.
We introduce a novel method that rewards both naturalness and content preservation.
We evaluate our method on English-to-Dutch literary translation, and find that our best model produces translations that are lexically richer and exhibit more properties of human-written language, without loss in translation accuracy.
arXiv Detail & Related papers (2024-12-11T15:42:22Z) - Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation [11.875491080062233]
Machine translations are found to be lexically poorer than human translations.
We propose a novel approach that consists of reranking translation candidates with a classifier that distinguishes between original and translated text.
We evaluate our approach on 31 English-to-Dutch book translations, and find that, for certain books, our approach retrieves lexical diversity scores that are close to human translation.
arXiv Detail & Related papers (2024-08-30T14:12:04Z) - (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts [52.18246881218829]
We introduce a novel multi-agent framework based on large language models (LLMs) for literary translation, implemented as a company called TransAgents.
To evaluate the effectiveness of our system, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP)
arXiv Detail & Related papers (2024-05-20T05:55:08Z) - Lost In Translation: Generating Adversarial Examples Robust to
Round-Trip Translation [66.33340583035374]
We present a comprehensive study on the robustness of current text adversarial attacks to round-trip translation.
We demonstrate that 6 state-of-the-art text-based adversarial attacks do not maintain their efficacy after round-trip translation.
We introduce an intervention-based solution to this problem, by integrating Machine Translation into the process of adversarial example generation.
arXiv Detail & Related papers (2023-07-24T04:29:43Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - Lost in Translationese? Reducing Translation Effect Using Abstract
Meaning Representation [11.358350306918027]
We argue that Abstract Meaning Representation (AMR) can be used as an interlingua to reduce the amount of translationese in translated texts.
By parsing English translations into an AMR and then generating text from that AMR, the result more closely resembles originally English text.
This work makes strides towards reducing translationese in text and highlights the utility of AMR as an interlingua.
arXiv Detail & Related papers (2023-04-23T00:04:14Z) - A Bilingual Parallel Corpus with Discourse Annotations [82.07304301996562]
This paper describes BWB, a large parallel corpus first introduced in Jiang et al. (2022), along with an annotated test set.
The BWB corpus consists of Chinese novels translated by experts into English, and the annotated test set is designed to probe the ability of machine translation systems to model various discourse phenomena.
arXiv Detail & Related papers (2022-10-26T12:33:53Z) - Time-Aware Ancient Chinese Text Translation and Inference [6.787414471399024]
We aim to address the challenges surrounding the translation of ancient Chinese text.
The linguistic gap due to the difference in eras results in translations that are poor in quality.
Most translations are missing the contextual information that is often very crucial to understanding the text.
arXiv Detail & Related papers (2021-07-07T12:23:52Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.