Language Model is a Branch Predictor for Simultaneous Machine
Translation
- URL: http://arxiv.org/abs/2312.14488v1
- Date: Fri, 22 Dec 2023 07:32:47 GMT
- Title: Language Model is a Branch Predictor for Simultaneous Machine
Translation
- Authors: Aoxiong Yin, Tianyun Zhong, Haoyuan Li, Siliang Tang, Zhou Zhao
- Abstract summary: We propose incorporating branch prediction techniques in SiMT tasks to reduce translation latency.
We utilize a language model as a branch predictor to predict potential branch directions.
When the actual source word deviates from the predicted source word, we use the real source word to decode the output again, replacing the predicted output.
- Score: 73.82754138171587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The primary objective of simultaneous machine translation (SiMT) is to
minimize latency while preserving the quality of the final translation. Drawing
inspiration from CPU branch prediction techniques, we propose incorporating
branch prediction techniques in SiMT tasks to reduce translation latency.
Specifically, we utilize a language model as a branch predictor to predict
potential branch directions, namely, future source words. Subsequently, we
utilize the predicted source words to decode the output in advance. When the
actual source word deviates from the predicted source word, we use the real
source word to decode the output again, replacing the predicted output. To
further reduce computational costs, we share the parameters of the encoder and
the branch predictor, and utilize a pre-trained language model for
initialization. Our proposed method can be seamlessly integrated with any SiMT
model. Extensive experimental results demonstrate that our approach can improve
translation quality and latency at the same time. Our code is available at
https://github.com/YinAoXiong/simt_branch_predictor .
Related papers
- Anticipating Future with Large Language Model for Simultaneous Machine Translation [27.613918824789877]
Simultaneous machine translation (SMT) takes streaming input utterances and incrementally produces target text.
We propose $textbfT$ranslation by $textbfA$nticipating $textbfF$uture (TAF)
Its core idea is to use a large language model (LLM) to predict future source words and opportunistically translate without introducing too much risk.
arXiv Detail & Related papers (2024-10-29T19:42:30Z) - CBSiMT: Mitigating Hallucination in Simultaneous Machine Translation
with Weighted Prefix-to-Prefix Training [13.462260072313894]
Simultaneous machine translation (SiMT) is a challenging task that requires starting translation before the full source sentence is available.
Prefix-to- framework is often applied to SiMT, which learns to predict target tokens using only a partial source prefix.
We propose a Confidence-Based Simultaneous Machine Translation framework, which uses model confidence to perceive hallucination tokens.
arXiv Detail & Related papers (2023-11-07T02:44:45Z) - Summarize and Generate to Back-translate: Unsupervised Translation of
Programming Languages [86.08359401867577]
Back-translation is widely known for its effectiveness for neural machine translation when little to no parallel data is available.
We propose performing back-translation via code summarization and generation.
We show that our proposed approach performs competitively with state-of-the-art methods.
arXiv Detail & Related papers (2022-05-23T08:20:41Z) - Anticipation-free Training for Simultaneous Translation [70.85761141178597]
Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available.
Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.
We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
arXiv Detail & Related papers (2022-01-30T16:29:37Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - Code-switching pre-training for neural machine translation [13.35263905025371]
This paper proposes a new pre-training method, called Code-Switching Pre-training (CSP) for Neural Machine Translation (NMT)
Unlike traditional pre-training method which randomly masks some fragments of the input sentence, the proposed CSP randomly replaces some words in the source sentence with their translation words in the target language.
Experimental results show that CSP achieves significant improvements over baselines without pre-training or with other pre-training methods.
arXiv Detail & Related papers (2020-09-17T06:10:07Z) - Neural Simultaneous Speech Translation Using Alignment-Based Chunking [4.224809458327515]
In simultaneous machine translation, the objective is to determine when to produce a partial translation given a continuous stream of source words.
We propose a neural machine translation (NMT) model that makes dynamic decisions when to continue feeding on input or generate output words.
Our results on the IWSLT 2020 English-to-German task outperform a wait-k baseline by 2.6 to 3.7% BLEU absolute.
arXiv Detail & Related papers (2020-05-29T10:20:48Z) - Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM)
We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior.
Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z) - Modeling Future Cost for Neural Machine Translation [62.427034890537676]
We propose a simple and effective method to model the future cost of each target word for NMT systems.
The proposed approach achieves significant improvements over strong Transformer-based NMT baseline.
arXiv Detail & Related papers (2020-02-28T05:37:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.