Related papers: SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation

SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation

URL: http://arxiv.org/abs/2505.20622v1
Date: Tue, 27 May 2025 01:59:58 GMT
Title: SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation
Authors: Ting Xu, Zhichao Huang, Jiankai Sun, Shanbo Cheng, Wai Lam,
Abstract summary: SeqPO-SiMT is a new policy optimization framework for simultaneous machine translation (SiMT)<n>It incorporates a tailored reward to enhance translation quality while reducing latency.<n>We conduct experiments on six datasets from diverse domains for En to Zh and Zh to En SiMT tasks.
Score: 51.79856805974686
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Sequential Policy Optimization for Simultaneous Machine Translation (SeqPO-SiMT), a new policy optimization framework that defines the simultaneous machine translation (SiMT) task as a sequential decision making problem, incorporating a tailored reward to enhance translation quality while reducing latency. In contrast to popular Reinforcement Learning from Human Feedback (RLHF) methods, such as PPO and DPO, which are typically applied in single-step tasks, SeqPO-SiMT effectively tackles the multi-step SiMT task. This intuitive framework allows the SiMT LLMs to simulate and refine the SiMT process using a tailored reward. We conduct experiments on six datasets from diverse domains for En to Zh and Zh to En SiMT tasks, demonstrating that SeqPO-SiMT consistently achieves significantly higher translation quality with lower latency. In particular, SeqPO-SiMT outperforms the supervised fine-tuning (SFT) model by 1.13 points in COMET, while reducing the Average Lagging by 6.17 in the NEWSTEST2021 En to Zh dataset. While SiMT operates with far less context than offline translation, the SiMT results of SeqPO-SiMT on 7B LLM surprisingly rival the offline translation of high-performing LLMs, including Qwen-2.5-7B-Instruct and LLaMA-3-8B-Instruct.

Related papers

LLMs Can Achieve High-quality Simultaneous Machine Translation as Efficiently as Offline [16.124385656402744]
Large Language Models (LLMs) perform excellently in offline machine translation even with a simple prompt "Translate the following sentence from [src lang] into [tgt lang]:"<n>We propose a novel paradigm that includes constructing supervised fine-tuning data for simultaneous machine translation (SiMT)<n>Our approach achieves state-of-the-art performance across various SiMT benchmarks, and preserves the original abilities of offline translation.
arXiv Detail & Related papers (2025-04-13T13:45:53Z)
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z)
SiLLM: Large Language Models for Simultaneous Machine Translation [41.303764786790616]
Simultaneous Machine Translation (SiMT) generates translations while reading the source sentence. Existing SiMT methods employ a single model to concurrently determine the policy and generate the translations. We propose SiLLM, which delegates the two sub-tasks to separate agents.
arXiv Detail & Related papers (2024-02-20T14:23:34Z)
TransLLaMa: LLM-based Simultaneous Translation System [18.27477980076409]
We show that a Decoder-only large language model (LLMs) can control input segmentation directly by generating a special "wait" token. This obviates the need for a separate policy and enables the LLM to perform English-German and English-Russian SiMT tasks. We also evaluated closed-source models such as GPT-4, which displayed encouraging results in performing the SiMT task without prior training.
arXiv Detail & Related papers (2024-02-07T07:39:27Z)
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation [50.00235162432848]
We train ALMA models with only 22K parallel sentences and 12M parameters. The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4.
arXiv Detail & Related papers (2024-01-16T15:04:51Z)
Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models [4.873927154453253]
Large language models (LLMs) with billions of parameters and pretrained on massive amounts of data are now capable of near or better than state-of-the-art performance in a variety of downstream natural language processing tasks. Simul-LLM is the first open-source fine-tuning and evaluation pipeline development framework for LLMs focused on SimulMT.
arXiv Detail & Related papers (2023-12-07T20:42:05Z)
Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding [73.32763904267186]
Large Language Models (LLMs) present the potential for achieving superior translation quality. We propose Cooperative Decoding (CoDec) which treats NMT systems as a pretranslation model and MT-oriented LLMs as a supplemental solution.
arXiv Detail & Related papers (2023-11-06T03:41:57Z)
Improving Simultaneous Machine Translation with Monolingual Data [94.1085601198393]
Simultaneous machine translation (SiMT) is usually done via sequence-level knowledge distillation (Seq-KD) from a full-sentence neural machine translation (NMT) model. We propose to leverage monolingual data to improve SiMT, which trains a SiMT student on the combination of bilingual data and external monolingual data distilled by Seq-KD.
arXiv Detail & Related papers (2022-12-02T14:13:53Z)
Learning to Multi-Task Learn for Better Neural Machine Translation [53.06405021125476]
Multi-task learning is an elegant approach to inject linguistic-related biases into neural machine translation models. We propose a novel framework for learning the training schedule, ie learning to multi-task learn, for the biased-MTL setting of interest. Experiments show the resulting automatically learned training schedulers are competitive with the best, and lead to up to +1.1 BLEU score improvements.
arXiv Detail & Related papers (2020-01-10T03:12:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.