Related papers: Reflective Translation: Improving Low-Resource Machine Translation via Structured Self-Reflection

Reflective Translation: Improving Low-Resource Machine Translation via Structured Self-Reflection

URL: http://arxiv.org/abs/2601.19871v1
Date: Tue, 27 Jan 2026 18:37:09 GMT
Title: Reflective Translation: Improving Low-Resource Machine Translation via Structured Self-Reflection
Authors: Nicholas Cheng,
Abstract summary: Low-resource languages such as isiZulu and isiXhosa face persistent challenges in machine translation due to limited parallel data and linguistic resources.<n>Recent advances in large language models suggest that self-reflection, prompting a model to critique and revise its own outputs, can improve reasoning quality and factual consistency.<n>This paper introduces Reflective Translation, a prompt-based framework in which a model generates an initial translation, produces a structured self-critique, and then uses this reflection to generate a refined translation.
Score: 0.15229257192293197
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Low-resource languages such as isiZulu and isiXhosa face persistent challenges in machine translation due to limited parallel data and linguistic resources. Recent advances in large language models suggest that self-reflection, prompting a model to critique and revise its own outputs, can improve reasoning quality and factual consistency. Building on this idea, this paper introduces Reflective Translation, a prompt-based framework in which a model generates an initial translation, produces a structured self-critique, and then uses this reflection to generate a refined translation. The approach is evaluated on English-isiZulu and English-isiXhosa translation using OPUS-100 and NTREX-African, across multiple prompting strategies and confidence thresholds. Results show consistent improvements in both BLEU and COMET scores between first- and second-pass translations, with average gains of up to +0.22 BLEU and +0.18 COMET. Statistical significance testing using paired nonparametric tests confirms that these improvements are robust. The proposed method is model-agnostic, requires no fine-tuning, and introduces a reflection-augmented dataset that can support future supervised or analysis-driven work. These findings demonstrate that structured self-reflection is a practical and effective mechanism for improving translation quality in low-resource settings.

Related papers

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets [2.0199251985015434]
We present a fully automated framework designed to enable scalable, high-quality translation of datasets and benchmarks.<n>We apply this approach to translate popular benchmarks and datasets into eight Eastern and Southern European languages.
arXiv Detail & Related papers (2026-02-25T18:58:25Z)
Unlocking Reasoning Capability on Machine Translation in Large Language Models [57.60641851466707]
Reasoning-oriented large language models (RLMs) achieve strong gains on tasks such as mathematics and coding by generating explicit intermediate reasoning.<n>We systematically evaluate several open- and closed-weights RLMs on the WMT24++ benchmark.<n>We find that enabling explicit reasoning consistently degrades translation quality across languages and models.
arXiv Detail & Related papers (2026-02-16T14:05:59Z)
Asm2SrcEval: Evaluating Large Language Models for Assembly-to-Source Code Translation [4.45354703148321]
Assembly-to-source code translation is a critical task in reverse engineering, cybersecurity, and software maintenance.<n>We present the first comprehensive evaluation of five state-of-the-art large language models on assembly-to-source translation.
arXiv Detail & Related papers (2025-11-28T12:40:30Z)
LANPO: Bootstrapping Language and Numerical Feedback for Reinforcement Learning in LLMs [73.27182315028021]
LANPO is a framework that cleanly separates the roles of feedback: language guides exploration, while numerical rewards drive optimization.<n>Our work provides a robust method for integrating historical experiences into the LLM RL loop, creating more effective and data-efficient learning agents.
arXiv Detail & Related papers (2025-10-18T15:51:19Z)
Self-Improvement in Language Models: The Sharpening Mechanism [70.9248553790022]
We offer a new perspective on the capabilities of self-improvement through a lens we refer to as sharpening.<n>Motivated by the observation that language models are often better at verifying response quality than they are at generating correct responses, we formalize self-improvement as using the model itself as a verifier during post-training.<n>We analyze two natural families of self-improvement algorithms based on SFT and RLHF.
arXiv Detail & Related papers (2024-12-02T20:24:17Z)
LLM-based Translation Inference with Iterative Bilingual Understanding [52.46978502902928]
We propose a novel Iterative Bilingual Understanding Translation method based on the cross-lingual capabilities of large language models (LLMs)<n>The cross-lingual capability of LLMs enables the generation of contextual understanding for both the source and target languages separately.<n>The proposed IBUT outperforms several strong comparison methods.
arXiv Detail & Related papers (2024-10-16T13:21:46Z)
DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms [43.148203559785095]
Large language models (LLMs) enhanced by self-reflection have achieved promising performance on machine translation. Existing self-reflection methods lack effective feedback information, limiting the translation performance. We introduce aREFLECT framework, leveraging the dual learning of translation tasks to provide effective feedback.
arXiv Detail & Related papers (2024-06-11T13:10:39Z)
The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance. Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes. We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z)
Context-Aware Machine Translation with Source Coreference Explanation [26.336947440529713]
We propose a model that explains the decisions made for translation by predicting coreference features in the input. We evaluate our method in the WMT document-level translation task of English-German dataset, the English-Russian dataset, and the multilingual TED talk dataset.
arXiv Detail & Related papers (2024-04-30T12:41:00Z)
Non-Autoregressive Neural Machine Translation: A Call for Clarity [3.1447111126465]
We take a step back and revisit several techniques that have been proposed for improving non-autoregressive translation models. We provide novel insights for establishing strong baselines using length prediction or CTC-based architecture variants. We contribute standardized BLEU, chrF++, and TER scores using sacreBLEU on four translation tasks.
arXiv Detail & Related papers (2022-05-21T12:15:22Z)
Understanding and Mitigating the Uncertainty in Zero-Shot Translation [92.25357943169601]
We aim to understand and alleviate the off-target issues from the perspective of uncertainty in zero-shot translation. We propose two lightweight and complementary approaches to denoise the training data for model training. Our approaches significantly improve the performance of zero-shot translation over strong MNMT baselines.
arXiv Detail & Related papers (2022-05-20T10:29:46Z)
Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization. We show how to practically optimize this objective for large translation corpora using an iterated best response scheme. Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.