PEGRL: Improving Machine Translation by Post-Editing Guided Reinforcement Learning
- URL: http://arxiv.org/abs/2602.03352v1
- Date: Tue, 03 Feb 2026 10:22:55 GMT
- Title: PEGRL: Improving Machine Translation by Post-Editing Guided Reinforcement Learning
- Authors: Yunzhi Shen, Hao Zhou, Xin Huang, Xue Han, Junlan Feng, Shujian Huang,
- Abstract summary: We introduce textbfPEGRL, a textittwo-stage RL framework that uses post-editing as an auxiliary task to stabilize training and guide overall optimization.<n>Experiments on English$to$Finnish, English$to$Turkish, and English$leftrightarrow$Chinese show consistent gains over RL baselines.
- Score: 54.19784655270799
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) has shown strong promise for LLM-based machine translation, with recent methods such as GRPO demonstrating notable gains; nevertheless, translation-oriented RL remains challenged by noisy learning signals arising from Monte Carlo return estimation, as well as a large trajectory space that favors global exploration over fine-grained local optimization. We introduce \textbf{PEGRL}, a \textit{two-stage} RL framework that uses post-editing as an auxiliary task to stabilize training and guide overall optimization. At each iteration, translation outputs are sampled to construct post-editing inputs, allowing return estimation in the post-editing stage to benefit from conditioning on the current translation behavior, while jointly supporting both global exploration and fine-grained local optimization. A task-specific weighting scheme further balances the contributions of translation and post-editing objectives, yielding a biased yet more sample-efficient estimator. Experiments on English$\to$Finnish, English$\to$Turkish, and English$\leftrightarrow$Chinese show consistent gains over RL baselines, and for English$\to$Turkish, performance on COMET-KIWI is comparable to advanced LLM-based systems (DeepSeek-V3.2).
Related papers
- From Utterance to Vividity: Training Expressive Subtitle Translation LLM via Adaptive Local Preference Optimization [12.547838537411215]
We focus on how to construct translation LLMs that meet the needs of domain customization.<n>We take visual media subtitle translation as our topic and explore how to train expressive and vivid translation LLMs.
arXiv Detail & Related papers (2026-02-01T07:24:06Z) - Lost in Literalism: How Supervised Training Shapes Translationese in LLMs [51.04435855143767]
Large language models (LLMs) have achieved remarkable success in machine translation.<n>However, translationese, characterized by overly literal and unnatural translations, remains a persistent challenge.<n>We introduce methods to mitigate these biases, including polishing golden references and filtering unnatural training instances.
arXiv Detail & Related papers (2025-03-06T12:14:45Z) - Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings [25.851419860597407]
We propose a novel approach that leverages fine-grained, token-level quality assessments along with error severity levels usingReinforcement learning.<n>We conduct experiments on small and large translation datasets with standard encoder-decoder and large language models-based machine translation systems.<n>Our results show that training with token-level rewards improves translation quality across language pairs over baselines according to both automatic and human evaluation.
arXiv Detail & Related papers (2024-11-08T21:55:37Z) - TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation [64.5862977630713]
This study investigates how Large Language Models (LLMs) leverage source and reference data in machine translation evaluation task.
We find that reference information significantly enhances the evaluation accuracy, while surprisingly, source information sometimes is counterproductive.
arXiv Detail & Related papers (2024-01-12T13:23:21Z) - POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource
Unsupervised Neural Machine Translation [32.76853731410492]
Low-resource languages (LRLs) face challenges in supervised neural machine translation due to limited parallel data.
We propose Probability-driven Meta-graph Prompter (POMP) to enhance Large Language Models' translation capabilities for LRLs.
Our experiments show significant improvements in the translation quality of three LRLs.
arXiv Detail & Related papers (2024-01-11T00:03:36Z) - Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning [50.9692060692705]
This paper introduces $textbfLanguage Models for $textbfMo$tion Control ($textbfLaMo$), a general framework based on Decision Transformers for offline RL.<n>Our framework highlights four crucial components:.<n>Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method,.<n>In particular, our method demonstrates superior performance in scenarios with limited data samples.
arXiv Detail & Related papers (2023-10-31T16:24:17Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.