Related papers: The Paradox of Poetic Intent in Back-Translation: Evaluating the Quality of Large Language Models in Chinese Translation

The Paradox of Poetic Intent in Back-Translation: Evaluating the Quality of Large Language Models in Chinese Translation

URL: http://arxiv.org/abs/2504.16286v2
Date: Mon, 28 Apr 2025 11:53:26 GMT
Title: The Paradox of Poetic Intent in Back-Translation: Evaluating the Quality of Large Language Models in Chinese Translation
Authors: Li Weigang, Pedro Carvalho Brom,
Abstract summary: This study constructs a diverse corpus encompassing Chinese scientific terminology, historical translation paradoxes, and literary metaphors.<n>We evaluate BLEU, CHRF, TER, and semantic similarity metrics across six major large language models (LLMs) and three traditional translation tools.
Score: 2.685668802278156
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid advancement of large language models (LLMs) has reshaped the landscape of machine translation, yet challenges persist in preserving poetic intent, cultural heritage, and handling specialized terminology in Chinese-English translation. This study constructs a diverse corpus encompassing Chinese scientific terminology, historical translation paradoxes, and literary metaphors. Utilizing a back-translation and Friedman test-based evaluation system (BT-Fried), we evaluate BLEU, CHRF, TER, and semantic similarity metrics across six major LLMs (e.g., GPT-4.5, DeepSeek V3) and three traditional translation tools. Key findings include: (1) Scientific abstracts often benefit from back-translation, while traditional tools outperform LLMs in linguistically distinct texts; (2) LLMs struggle with cultural and literary retention, exemplifying the "paradox of poetic intent"; (3) Some models exhibit "verbatim back-translation", reflecting emergent memory behavior; (4) A novel BLEU variant using Jieba segmentation and n-gram weighting is proposed. The study contributes to the empirical evaluation of Chinese NLP performance and advances understanding of cultural fidelity in AI-mediated translation.

Related papers

Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings [0.0]
This research proposes a novel approach for translating puns from English to French by combining state-of-the-art large language models with specialized techniques for wordplay generation.<n>Our methodology's primary objective is to capture the linguistic creativity and humor of the source text wordplay, rather than simply duplicating its vocabulary.
arXiv Detail & Related papers (2025-07-09T03:09:14Z)
MAS-LitEval : Multi-Agent System for Literary Translation Quality Assessment [5.703909513367545]
Literary translation requires preserving cultural nuances and stylistic elements.<n>Traditional metrics like BLEU and METEOR fail to assess due to their focus on lexical overlap.<n>We propose MAS-LitEval, a multi-agent system using Large Language Models (LLMs) to evaluate translations based on terminology, narrative, and style.
arXiv Detail & Related papers (2025-06-17T05:33:40Z)
How does a Multilingual LM Handle Multiple Languages? [0.0]
This study critically examines capabilities in multilingual understanding, semantic representation, and cross-lingual knowledge transfer.<n>It assesses semantic similarity by analyzing multilingual word embeddings for consistency using cosine similarity.<n>It examines BLOOM-1.7B and Qwen2 through Named Entity Recognition and sentence similarity tasks to understand their linguistic structures.
arXiv Detail & Related papers (2025-02-06T18:08:14Z)
A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls [15.50296318831118]
We propose and evaluate the feasibility of a two-stage pipeline to evaluate literary machine translation. Our framework provides fine-grained, interpretable metrics suited for literary translation.
arXiv Detail & Related papers (2024-12-02T10:07:01Z)
Transcending Language Boundaries: Harnessing LLMs for Low-Resource Language Translation [38.81102126876936]
This paper introduces a novel retrieval-based method that enhances translation quality for low-resource languages by focusing on key terms. To evaluate the effectiveness of this method, we conducted experiments translating from English into three low-resource languages: Cherokee, a critically endangered indigenous language of North America; Tibetan, a historically and culturally significant language in Asia; and Manchu, a language with few remaining speakers. Our comparison with the zero-shot performance of GPT-4o and LLaMA 3.1 405B, highlights the significant challenges these models face when translating into low-resource languages.
arXiv Detail & Related papers (2024-11-18T05:41:27Z)
LLM-based Translation Inference with Iterative Bilingual Understanding [52.46978502902928]
We propose a novel Iterative Bilingual Understanding Translation method based on the cross-lingual capabilities of large language models (LLMs)<n>The cross-lingual capability of LLMs enables the generation of contextual understanding for both the source and target languages separately.<n>The proposed IBUT outperforms several strong comparison methods.
arXiv Detail & Related papers (2024-10-16T13:21:46Z)
Cultural Adaptation of Menus: A Fine-Grained Approach [58.08115795037042]
Machine Translation of Culture-Specific Items (CSIs) poses significant challenges. Recent work on CSI translation has shown some success using Large Language Models (LLMs) to adapt to different languages and cultures. We introduce the ChineseMenuCSI dataset, the largest for Chinese-English menu corpora, annotated with CSI vs Non-CSI labels. We develop a novel methodology for automatic CSI identification, which outperforms GPT-based prompts in most categories.
arXiv Detail & Related papers (2024-08-24T09:25:18Z)
Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving [43.148203559785095]
Large language models (LLMs) with impressive multilingual capabilities may bring a ray of hope to achieve this extreme translation demand.<n>This paper first introduces a suitable benchmark (PoetMT) where each Chinese poetry has a recognized elegant translation.<n>We propose a new metric based on GPT-4 to evaluate the extent to which current LLMs can meet these demands.
arXiv Detail & Related papers (2024-08-19T12:34:31Z)
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts [52.18246881218829]
We introduce a novel multi-agent framework based on large language models (LLMs) for literary translation, implemented as a company called TransAgents. To evaluate the effectiveness of our system, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP)
arXiv Detail & Related papers (2024-05-20T05:55:08Z)
Adapting Large Language Models for Document-Level Machine Translation [46.370862171452444]
Large language models (LLMs) have significantly advanced various natural language processing (NLP) tasks. Recent research indicates that moderately-sized LLMs often outperform larger ones after task-specific fine-tuning. This study focuses on adapting LLMs for document-level machine translation (DocMT) for specific language pairs.
arXiv Detail & Related papers (2024-01-12T09:29:13Z)
Towards Effective Disambiguation for Machine Translation with Large Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences" Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z)
Benchmarking Machine Translation with Cultural Awareness [50.183458829028226]
Translating culture-related content is vital for effective cross-cultural communication. Many culture-specific items (CSIs) often lack viable translations across languages. This difficulty hinders the analysis of cultural awareness of machine translation systems.
arXiv Detail & Related papers (2023-05-23T17:56:33Z)
Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models. In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them. We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.