Related papers: Can Peter Pan Survive MT? A Stylometric Study of LLMs, NMTs, and HTs in Children's Literature Translation

Can Peter Pan Survive MT? A Stylometric Study of LLMs, NMTs, and HTs in Children's Literature Translation

URL: http://arxiv.org/abs/2506.22038v1
Date: Fri, 27 Jun 2025 09:34:40 GMT
Title: Can Peter Pan Survive MT? A Stylometric Study of LLMs, NMTs, and HTs in Children's Literature Translation
Authors: Delu Kong, Lieve Macken,
Abstract summary: The research constructs a Peter Pan corpus, comprising 21 translations: 7 human translations (HTs), 7 large language model translations (LLMs), and 7 neural machine translation outputs (NMTs)<n>The analysis employs a generic feature set (including lexical, syntactic readability, and n-gram features) and a creative text translation (CTT-specific) feature set, which captures repetition, rhythm, translatability, and miscellaneous levels, yielding 447 linguistic features in total.<n>Results reveal that in generic features, HTs and MTs exhibit significant differences in conjunction word distributions and the ratio of 1-word-gram-Yi
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This study focuses on evaluating the performance of machine translations (MTs) compared to human translations (HTs) in English-to-Chinese children's literature translation (CLT) from a stylometric perspective. The research constructs a Peter Pan corpus, comprising 21 translations: 7 human translations (HTs), 7 large language model translations (LLMs), and 7 neural machine translation outputs (NMTs). The analysis employs a generic feature set (including lexical, syntactic, readability, and n-gram features) and a creative text translation (CTT-specific) feature set, which captures repetition, rhythm, translatability, and miscellaneous levels, yielding 447 linguistic features in total. Using classification and clustering techniques in machine learning, we conduct a stylometric analysis of these translations. Results reveal that in generic features, HTs and MTs exhibit significant differences in conjunction word distributions and the ratio of 1-word-gram-YiYang, while NMTs and LLMs show significant variation in descriptive words usage and adverb ratios. Regarding CTT-specific features, LLMs outperform NMTs in distribution, aligning more closely with HTs in stylistic characteristics, demonstrating the potential of LLMs in CLT.

Related papers

Decoding Machine Translationese in English-Chinese News: LLMs vs. NMTs [0.0]
This study explores Machine Translationese (MTese) -- the linguistic peculiarities of machine translation outputs.<n>We construct a large dataset consisting of 4 sub-corpora and employ a comprehensive five-layer feature set.<n>Our findings confirm the presence of MTese in both Neural Machine Translation systems (NMTs) and Large Language Models (LLMs)
arXiv Detail & Related papers (2025-06-27T09:45:37Z)
Missing the human touch? A computational stylometry analysis of GPT-4 translations of online Chinese literature [2.3861843983281625]
This study examines the stylistic features of large language models (LLMs)<n> Computational stylometry analysis shows that GPT-4 translations closely align with human translations in lexical, syntactic, and content features.<n>These findings offer insights into AI's impact on literary translation from a posthuman perspective.
arXiv Detail & Related papers (2025-06-16T00:48:09Z)
Contextual Cues in Machine Translation: Investigating the Potential of Multi-Source Input Strategies in LLMs and NMT Systems [2.512491726995032]
We compare GPT-4o, a large language model, with a traditional multilingual neural machine translation (NMT) system.<n>Using intermediate language translations as contextual cues, we evaluate their effectiveness in enhancing English and Chinese translations into Portuguese.<n>Results suggest that contextual information significantly improves translation quality for domain-specific datasets and potentially for linguistically distant language pairs.
arXiv Detail & Related papers (2025-03-10T11:23:44Z)
Retrieval-Augmented Machine Translation with Unstructured Knowledge [74.84236945680503]
Retrieval-augmented generation (RAG) introduces additional information to enhance large language models (LLMs)<n>In machine translation (MT), previous work typically retrieves in-context examples from paired MT corpora, or domain-specific knowledge from knowledge graphs.<n>In this paper, we study retrieval-augmented MT using unstructured documents.
arXiv Detail & Related papers (2024-12-05T17:00:32Z)
The Comparison of Translationese in Machine Translation and Human Transation in terms of Translation Relations [7.776258153133857]
The research employs two parallel corpora, each spanning nine genres with the same source texts with one translated by NMT and the other by humans. The results indicate that NMT relies on literal translation significantly more than HT across genres.
arXiv Detail & Related papers (2024-03-27T19:12:20Z)
Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach [1.6982207802596105]
This study investigates three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT.
arXiv Detail & Related papers (2023-12-17T15:56:05Z)
Translation-Enhanced Multilingual Text-to-Image Generation [61.41730893884428]
Research on text-to-image generation (TTI) still predominantly focuses on the English language. In this work, we thus investigate multilingual TTI and the current potential of neural machine translation (NMT) to bootstrap mTTI systems. We propose Ensemble Adapter (EnsAd), a novel parameter-efficient approach that learns to weigh and consolidate the multilingual text knowledge within the mTTI framework.
arXiv Detail & Related papers (2023-05-30T17:03:52Z)
Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al. We investigate the similarities and differences between the discourse structures of source and target languages. We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z)
Exploring Human-Like Translation Strategy with Large Language Models [93.49333173279508]
Large language models (LLMs) have demonstrated impressive capabilities in general scenarios. This work proposes the MAPS framework, which stands for Multi-Aspect Prompting and Selection. We employ a selection mechanism based on quality estimation to filter out noisy and unhelpful knowledge.
arXiv Detail & Related papers (2023-05-06T19:03:12Z)
Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks. This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z)
mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs [51.67970832510462]
We improve multilingual text-to-text transfer Transformer with translation pairs (mT6) We explore three cross-lingual text-to-text pre-training tasks, namely, machine translation, translation pair span corruption, and translation span corruption. Experimental results show that the proposed mT6 improves cross-lingual transferability over mT5.
arXiv Detail & Related papers (2021-04-18T03:24:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.