Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies
- URL: http://arxiv.org/abs/2509.21801v1
- Date: Fri, 26 Sep 2025 02:57:36 GMT
- Title: Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies
- Authors: Qianen Zhang, Satoshi Nakamura,
- Abstract summary: Simultaneous Machine Translation (SiMT) requires high-quality translations under strict real-time constraints.<n>We extend the action space of SiMT with four adaptive actions: SENTENCE_CUT, DROP, PARTIAL_MARIZATION and PRONOMINALIZATION.<n>We implement these actions in a decoder-only large language model (LLM) framework and construct training references through action-aware prompting.
- Score: 4.487634497356904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simultaneous Machine Translation (SiMT) requires high-quality translations under strict real-time constraints, which traditional encoder-decoder policies with only READ/WRITE actions cannot fully address. We extend the action space of SiMT with four adaptive actions: SENTENCE_CUT, DROP, PARTIAL_SUMMARIZATION and PRONOMINALIZATION, which enable real-time restructuring, omission, and simplification while preserving semantic fidelity. We implement these actions in a decoder-only large language model (LLM) framework and construct training references through action-aware prompting. To evaluate both quality and latency, we further develop a latency-aware TTS pipeline that maps textual outputs to speech with realistic timing. Experiments on the ACL60/60 English-Chinese and English-German benchmarks show that our framework consistently improves semantic metrics (e.g., COMET-KIWI) and achieves lower delay (measured by Average Lagging) compared to reference translations and salami-based baselines. Notably, combining DROP and SENTENCE_CUT yields the best overall balance between fluency and latency. These results demonstrate that enriching the action space of LLM-based SiMT provides a promising direction for bridging the gap between human and machine interpretation.
Related papers
- Beyond Many-Shot Translation: Scaling In-Context Demonstrations For Low-Resource Machine Translation [49.82863380286994]
In-context learning may offer novel ways to adapt Large Language Models for low-resource machine translation.<n>In this study, we explore scaling low-resource machine translation ICL beyond the few-shot setting to thousands of examples with long-context models.<n>Our experiments on Javanese and Sundanese show that gains from additional context saturate quickly and can degrade near the maximum context window.
arXiv Detail & Related papers (2026-02-04T17:02:22Z) - Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies [6.010207559477024]
Simultaneous Machine Translation (SiMT) requires high-quality translations under strict real-time constraints.<n>We extend the action space of SiMT with four adaptive actions: Sentence_Cut, Drop, Partial_Summarization and Pronominalization.<n>We adapt these actions in a large language model (LLM) framework and construct training references through action-aware prompting.
arXiv Detail & Related papers (2026-01-16T05:26:16Z) - Beyond Literal Mapping: Benchmarking and Improving Non-Literal Translation Evaluation [57.11989521509119]
We propose a novel agentic translation evaluation framework, centered by a reflective Core Agent that invokes specialized sub-agents.<n> Experimental results indicate the efficacy of RATE, achieving an improvement of at least 3.2 meta score compared with current metrics.
arXiv Detail & Related papers (2026-01-12T09:03:42Z) - DPO-Tuned Large Language Models for Segmentation in Simultaneous Speech Translation [6.611635315225665]
Simultaneous speech translation requires accurate segmentation to balance translation quality and latency.<n>We propose a segmentation framework based on large language models trained with Direct Preference Optimization (DPO)<n>By leveraging preference alignment, our method enables LLMs to predict natural segmentation points that better meet the demands of real-time translation.
arXiv Detail & Related papers (2025-10-14T06:41:36Z) - Better Late Than Never: Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation [13.949286462892212]
Simultaneous speech-to-text translation (SimulST) systems have to balance translation quality with latency.<n>Existing metrics often produce inconsistent or misleading results.<n>We present the first comprehensive analysis of SimulST latency metrics across language pairs, systems, and both short- and long-form regimes.
arXiv Detail & Related papers (2025-09-22T04:21:19Z) - Overcoming Latency Bottlenecks in On-Device Speech Translation: A Cascaded Approach with Alignment-Based Streaming MT [19.133273093370896]
This paper tackles several challenges when integrating Automatic Speech Recognition (ASR) and Machine Translation (MT) for real-time, on-device streaming speech translation.<n>We propose a simultaneous translation approach that effectively balances translation quality and latency.<n>We apply our approach to an on-device bilingual conversational speech translation and demonstrate that our techniques outperform baselines in terms of latency and quality.
arXiv Detail & Related papers (2025-08-18T21:00:11Z) - Efficient and Adaptive Simultaneous Speech Translation with Fully Unidirectional Architecture [14.056534007451763]
Simultaneous speech translation (SimulST) produces translations incrementally while processing partial speech input.<n>Existing LLM-based SimulST approaches incur significant computational overhead due to repeated encoding of bidirectional speech encoder.<n>We introduce Efficient and Adaptive Simultaneous Speech Translation (EASiST) with fully unidirectional architecture.
arXiv Detail & Related papers (2025-04-16T06:46:15Z) - LLMs Can Achieve High-quality Simultaneous Machine Translation as Efficiently as Offline [16.124385656402744]
Large Language Models (LLMs) perform excellently in offline machine translation even with a simple prompt "Translate the following sentence from [src lang] into [tgt lang]:"<n>We propose a novel paradigm that includes constructing supervised fine-tuning data for simultaneous machine translation (SiMT)<n>Our approach achieves state-of-the-art performance across various SiMT benchmarks, and preserves the original abilities of offline translation.
arXiv Detail & Related papers (2025-04-13T13:45:53Z) - Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - Data-Driven Adaptive Simultaneous Machine Translation [51.01779863078624]
We propose a novel and efficient training scheme for adaptive SimulMT.
Our method outperforms all strong baselines in terms of translation quality and latency.
arXiv Detail & Related papers (2022-04-27T02:40:21Z) - Anticipation-free Training for Simultaneous Translation [70.85761141178597]
Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available.
Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.
We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
arXiv Detail & Related papers (2022-01-30T16:29:37Z) - On the Limitations of Cross-lingual Encoders as Exposed by
Reference-Free Machine Translation Evaluation [55.02832094101173]
Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual similarity.
This paper concerns ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations.
We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.
We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations.
arXiv Detail & Related papers (2020-05-03T22:10:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.