MAS-LitEval : Multi-Agent System for Literary Translation Quality Assessment
- URL: http://arxiv.org/abs/2506.14199v1
- Date: Tue, 17 Jun 2025 05:33:40 GMT
- Title: MAS-LitEval : Multi-Agent System for Literary Translation Quality Assessment
- Authors: Junghwan Kim, Kieun Park, Sohee Park, Hyunggug Kim, Bongwon Suh,
- Abstract summary: Literary translation requires preserving cultural nuances and stylistic elements.<n>Traditional metrics like BLEU and METEOR fail to assess due to their focus on lexical overlap.<n>We propose MAS-LitEval, a multi-agent system using Large Language Models (LLMs) to evaluate translations based on terminology, narrative, and style.
- Score: 5.703909513367545
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Literary translation requires preserving cultural nuances and stylistic elements, which traditional metrics like BLEU and METEOR fail to assess due to their focus on lexical overlap. This oversight neglects the narrative consistency and stylistic fidelity that are crucial for literary works. To address this, we propose MAS-LitEval, a multi-agent system using Large Language Models (LLMs) to evaluate translations based on terminology, narrative, and style. We tested MAS-LitEval on translations of The Little Prince and A Connecticut Yankee in King Arthur's Court, generated by various LLMs, and compared it to traditional metrics. \textbf{MAS-LitEval} outperformed these metrics, with top models scoring up to 0.890 in capturing literary nuances. This work introduces a scalable, nuanced framework for Translation Quality Assessment (TQA), offering a practical tool for translators and researchers.
Related papers
- LiTransProQA: an LLM-based Literary Translation evaluation metric with Professional Question Answering [21.28047224832753]
LiTransProQA is a novel, reference-free, LLM-based question-answering framework designed for literary translation evaluation.<n>It integrates insights from professional literary translators and researchers, focusing on literary devices, cultural understanding, and authorial voice.<n>LiTransProQA substantially outperforms current metrics, achieving up to 0.07 gain in correlation and surpassing the best state-of-the-art metrics by over 15 points in adequacy assessments.
arXiv Detail & Related papers (2025-05-08T17:12:56Z) - The Paradox of Poetic Intent in Back-Translation: Evaluating the Quality of Large Language Models in Chinese Translation [2.685668802278156]
This study constructs a diverse corpus encompassing Chinese scientific terminology, historical translation paradoxes, and literary metaphors.<n>We evaluate BLEU, CHRF, TER, and semantic similarity metrics across six major large language models (LLMs) and three traditional translation tools.
arXiv Detail & Related papers (2025-04-22T21:48:05Z) - DRT: Deep Reasoning Translation via Long Chain-of-Thought [89.48208612476068]
In this paper, we introduce DRT, an attempt to bring the success of long CoT to neural machine translation (MT)<n>We first mine sentences containing similes or metaphors from existing literature books, and then develop a multi-agent framework to translate these sentences via long thought.<n>Using Qwen2.5 and LLama-3.1 as the backbones, DRT models can learn the thought process during machine translation.
arXiv Detail & Related papers (2024-12-23T11:55:33Z) - A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls [15.50296318831118]
We propose and evaluate the feasibility of a two-stage pipeline to evaluate literary machine translation.<n>Our framework provides fine-grained, interpretable metrics suited for literary translation.
arXiv Detail & Related papers (2024-12-02T10:07:01Z) - Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving [43.148203559785095]
Large language models (LLMs) with impressive multilingual capabilities may bring a ray of hope to achieve this extreme translation demand.<n>This paper first introduces a suitable benchmark (PoetMT) where each Chinese poetry has a recognized elegant translation.<n>We propose a new metric based on GPT-4 to evaluate the extent to which current LLMs can meet these demands.
arXiv Detail & Related papers (2024-08-19T12:34:31Z) - (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts [56.7988577327046]
We introduce TransAgents, a novel multi-agent framework that simulates the roles and collaborative practices of a human translation company.<n>Our findings highlight the potential of multi-agent collaboration in enhancing translation quality, particularly for longer texts.
arXiv Detail & Related papers (2024-05-20T05:55:08Z) - Towards Effective Disambiguation for Machine Translation with Large
Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences"
Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z) - Exploring Human-Like Translation Strategy with Large Language Models [93.49333173279508]
Large language models (LLMs) have demonstrated impressive capabilities in general scenarios.
This work proposes the MAPS framework, which stands for Multi-Aspect Prompting and Selection.
We employ a selection mechanism based on quality estimation to filter out noisy and unhelpful knowledge.
arXiv Detail & Related papers (2023-05-06T19:03:12Z) - Exploring Document-Level Literary Machine Translation with Parallel
Paragraphs from World Literature [35.1398797683712]
We show that literary translators prefer reference human translations over machine-translated paragraphs at a rate of 84%.
We train a post-editing model whose output is preferred over normal MT output at a rate of 69% by experts.
arXiv Detail & Related papers (2022-10-25T18:03:34Z) - On the Limitations of Cross-lingual Encoders as Exposed by
Reference-Free Machine Translation Evaluation [55.02832094101173]
Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual similarity.
This paper concerns ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations.
We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.
We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations.
arXiv Detail & Related papers (2020-05-03T22:10:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.