Evaluating LLMs on Chinese Idiom Translation
- URL: http://arxiv.org/abs/2508.10421v1
- Date: Thu, 14 Aug 2025 07:52:56 GMT
- Title: Evaluating LLMs on Chinese Idiom Translation
- Authors: Cai Yang, Yao Dou, David Heineman, Xiaofeng Wu, Wei Xu,
- Abstract summary: Despite recent progress in machine translation, little is known about Chinese idiom translation.<n>We introduceEval, a framework with a comprehensive error taxonomy for Chinese idiom translation.
- Score: 12.580058582681968
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Idioms, whose figurative meanings usually differ from their literal interpretations, are common in everyday language, especially in Chinese, where they often contain historical references and follow specific structural patterns. Despite recent progress in machine translation with large language models, little is known about Chinese idiom translation. In this work, we introduce IdiomEval, a framework with a comprehensive error taxonomy for Chinese idiom translation. We annotate 900 translation pairs from nine modern systems, including GPT-4o and Google Translate, across four domains: web, news, Wikipedia, and social media. We find these systems fail at idiom translation, producing incorrect, literal, partial, or even missing translations. The best-performing system, GPT-4, makes errors in 28% of cases. We also find that existing evaluation metrics measure idiom quality poorly with Pearson correlation below 0.48 with human ratings. We thus develop improved models that achieve F$_1$ scores of 0.68 for detecting idiom translation errors.
Related papers
- A Rising Tide Lifts All Boats: MTQE Rewards for Idioms Improve General Translation Quality [13.512688251831902]
Non-compositional expressions (e.g., idioms, proverbs, and metaphors) pose significant challenges for neural machine translation systems.<n>We investigate GRPO-style fine-tuning using Machine Translation Quality Estimation (MTQE) models as reward functions to train models to better translate idioms.<n>Using Chinese and Hindi datasets, we find that idiom translation abilities improve by 14 points, general, non-idiomatic translation implicitly improves by 8 points, and cross-lingual translation abilities (trained on one language, evaluated on another language) improves by 6 points.
arXiv Detail & Related papers (2026-01-09T20:55:09Z) - Chengyu-Bench: Benchmarking Large Language Models for Chinese Idiom Understanding and Use [1.5129424416840094]
Chengyu-Bench comprises 2,937 human-verified examples covering 1,765 common idioms sourced from diverse corpora.<n>We evaluate leading LLMs and find they achieve over 95% accuracy on Evaluative Connotation, but only 85% on Appropriateness and 40% top-1 accuracy on Open Cloze.<n>Chengyu-Bench demonstrates that while LLMs can reliably gauge idiom sentiment, they still struggle to grasp the cultural and contextual nuances essential for proper usage.
arXiv Detail & Related papers (2025-06-22T17:26:09Z) - Large Language Models for Persian $ \leftrightarrow $ English Idiom Translation [5.689194193929357]
Large language models (LLMs) have shown superior capabilities in translating figurative language compared to neural machine translation (NMT) systems.<n>This paper introduces two parallel datasets of sentences containing idiomatic expressions for Persian$rightarrow$English and English$rightarrow$Persian translations.<n>We evaluate various open- and closed-source LLMs, NMT models, and their combinations.<n>Experiments reveal that Claude-3.5-Sonnet delivers outstanding results in both translation directions.
arXiv Detail & Related papers (2024-12-13T09:29:27Z) - Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages [55.157295899188476]
neural machine translation systems learn to map sentences of different languages into a common representation space.
In this work, we test this hypothesis by zero-shot translating from unseen languages.
We demonstrate that this setup enables zero-shot translation from entirely unseen languages.
arXiv Detail & Related papers (2024-08-05T07:58:58Z) - Improving LLM Abilities in Idiomatic Translation [2.8692611791027893]
For language models (LLMs) like NLLB and GPT, translating idioms remains a challenge.<n>Our goal is to enhance translation fidelity by improving LLM processing of idiomatic language.<n>This has a significant social impact, as it preserves cultural nuances and ensures translated texts retain intent and emotional resonance.
arXiv Detail & Related papers (2024-07-03T21:34:26Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Mitigating Hallucinations and Off-target Machine Translation with
Source-Contrastive and Language-Contrastive Decoding [53.84948040596055]
We introduce two related methods to mitigate failure cases with a modified decoding objective.
Experiments on the massively multilingual models M2M-100 (418M) and SMaLL-100 show that these methods suppress hallucinations and off-target translations.
arXiv Detail & Related papers (2023-09-13T17:15:27Z) - Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing
Idiomatic Translation with Language Models [57.60487455727155]
idioms, with their non-compositional nature, pose particular challenges for Transformer-based systems.
Traditional methods, which replace idioms using existing knowledge bases (KBs), often lack scale and context awareness.
We introduce a multilingual idiom KB (IdiomKB) developed using large LMs to address this.
This KB facilitates better translation by smaller models, such as BLOOMZ (7.1B), Alpaca (7B), and InstructGPT (6.7B)
arXiv Detail & Related papers (2023-08-26T21:38:31Z) - Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT)
This paper systematically investigates the advantages and challenges of LLMs for MMT.
We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z) - Can Transformer be Too Compositional? Analysing Idiom Processing in
Neural Machine Translation [55.52888815590317]
Unlike literal expressions, idioms' meanings do not directly follow from their parts.
NMT models are often unable to translate idioms accurately and over-generate compositional, literal translations.
We investigate whether the non-compositionality of idioms is reflected in the mechanics of the dominant NMT model, Transformer.
arXiv Detail & Related papers (2022-05-30T17:59:32Z) - PETCI: A Parallel English Translation Dataset of Chinese Idioms [0.0]
Current machine translation models perform poorly idiom translation, while idioms are sparse in many translation datasets.
We present a parallel English translation dataset of Chinese idioms, aiming to improve translation by both human and machine.
arXiv Detail & Related papers (2022-02-19T03:16:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.