Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies
- URL: http://arxiv.org/abs/2601.11002v1
- Date: Fri, 16 Jan 2026 05:26:16 GMT
- Title: Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies
- Authors: Qianen Zhang, Zeyu Yang, Satoshi Nakamura,
- Abstract summary: Simultaneous Machine Translation (SiMT) requires high-quality translations under strict real-time constraints.<n>We extend the action space of SiMT with four adaptive actions: Sentence_Cut, Drop, Partial_Summarization and Pronominalization.<n>We adapt these actions in a large language model (LLM) framework and construct training references through action-aware prompting.
- Score: 6.010207559477024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simultaneous Machine Translation (SiMT) requires high-quality translations under strict real-time constraints, which traditional policies with only READ/WRITE actions cannot fully address. We extend the action space of SiMT with four adaptive actions: Sentence_Cut, Drop, Partial_Summarization and Pronominalization, which enable real-time restructuring, omission, and simplification while preserving semantic fidelity. We adapt these actions in a large language model (LLM) framework and construct training references through action-aware prompting. To evaluate both quality and word-level monotonicity, we further develop a latency-aware TTS pipeline that maps textual outputs to speech with realistic timing. Experiments on the ACL60/60 English-Chinese, English-German and English-Japanese benchmarks show that our framework consistently improves semantic metrics and achieves lower delay compared to reference translations and salami-based baselines. Notably, combining Drop and Sentence_Cut leads to consistent improvements in the balance between fluency and latency. These results demonstrate that enriching the action space of LLM-based SiMT provides a promising direction for bridging the gap between human and machine interpretation.
Related papers
- Simultaneous Speech-to-Speech Translation Without Aligned Data [52.467808474293605]
Simultaneous speech translation requires translating source speech into a target language in real-time.<n>We propose Hibiki-Zero, which eliminates the need for word-level alignments entirely.<n>Hibiki-Zero achieves state-of-the-art performance in translation accuracy, latency, voice transfer, and naturalness across five X-to-English tasks.
arXiv Detail & Related papers (2026-02-11T17:41:01Z) - Beyond Many-Shot Translation: Scaling In-Context Demonstrations For Low-Resource Machine Translation [49.82863380286994]
In-context learning may offer novel ways to adapt Large Language Models for low-resource machine translation.<n>In this study, we explore scaling low-resource machine translation ICL beyond the few-shot setting to thousands of examples with long-context models.<n>Our experiments on Javanese and Sundanese show that gains from additional context saturate quickly and can degrade near the maximum context window.
arXiv Detail & Related papers (2026-02-04T17:02:22Z) - HOMURA: Taming the Sand-Glass for Time-Constrained LLM Translation via Reinforcement Learning [10.471350835897757]
Large Language Models (LLMs) have achieved remarkable strides in multilingual translation but are hindered by a systemic cross-lingual verbosity bias.<n>Current prompt-engineering approaches struggle to resolve this conflict between semantic fidelity and rigid temporal feasibility.
arXiv Detail & Related papers (2026-01-15T08:45:54Z) - Beyond Literal Mapping: Benchmarking and Improving Non-Literal Translation Evaluation [57.11989521509119]
We propose a novel agentic translation evaluation framework, centered by a reflective Core Agent that invokes specialized sub-agents.<n> Experimental results indicate the efficacy of RATE, achieving an improvement of at least 3.2 meta score compared with current metrics.
arXiv Detail & Related papers (2026-01-12T09:03:42Z) - DPO-Tuned Large Language Models for Segmentation in Simultaneous Speech Translation [6.611635315225665]
Simultaneous speech translation requires accurate segmentation to balance translation quality and latency.<n>We propose a segmentation framework based on large language models trained with Direct Preference Optimization (DPO)<n>By leveraging preference alignment, our method enables LLMs to predict natural segmentation points that better meet the demands of real-time translation.
arXiv Detail & Related papers (2025-10-14T06:41:36Z) - Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies [4.487634497356904]
Simultaneous Machine Translation (SiMT) requires high-quality translations under strict real-time constraints.<n>We extend the action space of SiMT with four adaptive actions: SENTENCE_CUT, DROP, PARTIAL_MARIZATION and PRONOMINALIZATION.<n>We implement these actions in a decoder-only large language model (LLM) framework and construct training references through action-aware prompting.
arXiv Detail & Related papers (2025-09-26T02:57:36Z) - Direct Simultaneous Translation Activation for Large Audio-Language Models [58.03785696031301]
Simultaneous speech-to-text translation (Simul-S2TT) aims to translate speech into target text in real time.<n>We introduce bf Simultaneous bf Self-bf Augmentation (bf SimulSA), a strategy that utilizes LALMs' inherent capabilities to obtain simultaneous data.
arXiv Detail & Related papers (2025-09-19T07:12:18Z) - Lost in Literalism: How Supervised Training Shapes Translationese in LLMs [51.04435855143767]
Large language models (LLMs) have achieved remarkable success in machine translation.<n>However, translationese, characterized by overly literal and unnatural translations, remains a persistent challenge.<n>We introduce methods to mitigate these biases, including polishing golden references and filtering unnatural training instances.
arXiv Detail & Related papers (2025-03-06T12:14:45Z) - High-Fidelity Simultaneous Speech-To-Speech Translation [75.69884829562591]
We introduce Hibiki, a decoder-only model for simultaneous speech translation.<n>Hibiki leverages a multistream language model to synchronously process source and target speech, and jointly produces text and audio tokens to perform speech-to-text and speech-to-speech translation.
arXiv Detail & Related papers (2025-02-05T17:18:55Z) - Simultaneous Machine Translation with Tailored Reference [35.46823126036308]
Simultaneous machine translation (SiMT) generates translation while reading the whole source sentence.
Existing SiMT models are typically trained using the same reference disregarding the varying amounts of available source information at different latency.
We propose a novel method that provides tailored reference for the SiMT models trained at different latency by rephrasing the ground-truth.
arXiv Detail & Related papers (2023-10-20T15:32:26Z) - Data-Driven Adaptive Simultaneous Machine Translation [51.01779863078624]
We propose a novel and efficient training scheme for adaptive SimulMT.
Our method outperforms all strong baselines in terms of translation quality and latency.
arXiv Detail & Related papers (2022-04-27T02:40:21Z) - Anticipation-free Training for Simultaneous Translation [70.85761141178597]
Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available.
Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.
We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
arXiv Detail & Related papers (2022-01-30T16:29:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.