Estimating Machine Translation Difficulty
- URL: http://arxiv.org/abs/2508.10175v2
- Date: Thu, 28 Aug 2025 17:54:36 GMT
- Title: Estimating Machine Translation Difficulty
- Authors: Lorenzo Proietti, Stefano Perrella, Vilém Zouhar, Roberto Navigli, Tom Kocmi,
- Abstract summary: We formalize the task of translation difficulty estimation, defining a text's difficulty based on the expected quality of its translations.<n>We demonstrate the practical utility of difficulty estimators by using them to construct more challenging benchmarks for machine translation.<n>We release two improved models for difficulty estimation, Sentinel-src-24 and Sentinel-src-25.
- Score: 48.659971048116766
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine translation quality has steadily improved over the years, achieving near-perfect translations in recent benchmarks. These high-quality outputs make it difficult to distinguish between state-of-the-art models and to identify areas for future improvement. In this context, automatically identifying texts where machine translation systems struggle holds promise for developing more discriminative evaluations and guiding future research. In this work, we address this gap by formalizing the task of translation difficulty estimation, defining a text's difficulty based on the expected quality of its translations. We introduce a new metric to evaluate difficulty estimators and use it to assess both baselines and novel approaches. Finally, we demonstrate the practical utility of difficulty estimators by using them to construct more challenging benchmarks for machine translation. Our results show that dedicated models outperform both heuristic-based methods and LLM-as-a-judge approaches, with Sentinel-src achieving the best performance. Thus, we release two improved models for difficulty estimation, Sentinel-src-24 and Sentinel-src-25, which can be used to scan large collections of texts and select those most likely to challenge contemporary machine translation systems.
Related papers
- Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets [2.0199251985015434]
We present a fully automated framework designed to enable scalable, high-quality translation of datasets and benchmarks.<n>We apply this approach to translate popular benchmarks and datasets into eight Eastern and Southern European languages.
arXiv Detail & Related papers (2026-02-25T18:58:25Z) - Automatic Machine Translation Detection Using a Surrogate Multilingual Translation Model [4.750257527930005]
We propose a novel approach to distinguish between human and machine-translated sentences.<n> Experimental results show that our method outperforms current state-of-the-art techniques.
arXiv Detail & Related papers (2025-11-04T19:59:25Z) - Evaluating Language Translation Models by Playing Telephone [5.02470728447561]
We propose an unsupervised method to generate training data for translation evaluation over different document lengths and application domains.<n>We evaluate evaluation systems trained on texts mechanically generated using both model rotation and language translation approaches.
arXiv Detail & Related papers (2025-09-23T22:01:52Z) - Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering [68.3400058037817]
We introduce TREQA (Translation Evaluation via Question-Answering), a framework that extrinsically evaluates translation quality.<n>We show that TREQA is competitive with and, in some cases, outperforms state-of-the-art neural and LLM-based metrics in ranking alternative paragraph-level translations.
arXiv Detail & Related papers (2025-04-10T09:24:54Z) - Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation [55.73341401764367]
We introduce DCSQE, a novel framework for alleviating distribution shift in synthetic QE data.<n> DCSQE uses references, i.e., translation supervision signals, to guide both the generation and annotation processes.<n>Experiments demonstrate that DCSQE outperforms SOTA baselines in both supervised and unsupervised settings.
arXiv Detail & Related papers (2025-02-27T10:11:53Z) - Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems [16.102196839755823]
We introduce Translation Canvas, an explainable interface designed to pinpoint and analyze translation systems' performance.
It supports fine-grained analysis by highlighting error spans with explanations and selectively displaying systems' predictions.
According to human evaluation, Translation Canvas demonstrates superior performance over COMET and SacreBLEU packages.
arXiv Detail & Related papers (2024-10-07T16:54:18Z) - Advancing Translation Preference Modeling with RLHF: A Step Towards
Cost-Effective Solution [57.42593422091653]
We explore leveraging reinforcement learning with human feedback to improve translation quality.
A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality.
arXiv Detail & Related papers (2024-02-18T09:51:49Z) - Towards Effective Disambiguation for Machine Translation with Large
Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences"
Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Quality Estimation of Machine Translated Texts based on Direct Evidence
from Training Data [0.0]
We show that the parallel corpus used as training data for training the MT system holds direct clues for estimating the quality of translations produced by the MT system.
Our experiments show that this simple and direct method holds promise for quality estimation of translations produced by any purely data driven machine translation system.
arXiv Detail & Related papers (2023-06-27T11:52:28Z) - Easy Guided Decoding in Providing Suggestions for Interactive Machine
Translation [14.615314828955288]
We propose a novel constrained decoding algorithm, namely Prefix Suffix Guided Decoding (PSGD)
PSGD improves translation quality by an average of $10.87$ BLEU and $8.62$ BLEU on the WeTS and the WMT 2022 Translation Suggestion datasets.
arXiv Detail & Related papers (2022-11-14T03:40:02Z) - Rethinking Round-Trip Translation for Machine Translation Evaluation [44.83568796515321]
We report the surprising finding that round-trip translation can be used for automatic evaluation without the references.
We demonstrate the rectification is overdue as round-trip translation could benefit multiple machine translation evaluation tasks.
arXiv Detail & Related papers (2022-09-15T15:06:20Z) - Computer Assisted Translation with Neural Quality Estimation and
Automatic Post-Editing [18.192546537421673]
We propose an end-to-end deep learning framework of the quality estimation and automatic post-editing of the machine translation output.
Our goal is to provide error correction suggestions and to further relieve the burden of human translators through an interpretable model.
arXiv Detail & Related papers (2020-09-19T00:29:00Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.