Glancing Future for Simultaneous Machine Translation
- URL: http://arxiv.org/abs/2309.06179v1
- Date: Tue, 12 Sep 2023 12:46:20 GMT
- Title: Glancing Future for Simultaneous Machine Translation
- Authors: Shoutao Guo, Shaolei Zhang, Yang Feng
- Abstract summary: We propose a novel method to bridge the gap between the prefix2 training and seq2seq training.
We gradually reduce the available source information from the whole sentence to the prefix corresponding to that latency.
Our method is applicable to a wide range of SiMT methods and experiments demonstrate that our method outperforms strong baselines.
- Score: 35.46823126036308
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Simultaneous machine translation (SiMT) outputs translation while reading the
source sentence. Unlike conventional sequence-to-sequence (seq2seq) training,
existing SiMT methods adopt the prefix-to-prefix (prefix2prefix) training,
where the model predicts target tokens based on partial source tokens. However,
the prefix2prefix training diminishes the ability of the model to capture
global information and introduces forced predictions due to the absence of
essential source information. Consequently, it is crucial to bridge the gap
between the prefix2prefix training and seq2seq training to enhance the
translation capability of the SiMT model. In this paper, we propose a novel
method that glances future in curriculum learning to achieve the transition
from the seq2seq training to prefix2prefix training. Specifically, we gradually
reduce the available source information from the whole sentence to the prefix
corresponding to that latency. Our method is applicable to a wide range of SiMT
methods and experiments demonstrate that our method outperforms strong
baselines.
Related papers
- PsFuture: A Pseudo-Future-based Zero-Shot Adaptive Policy for Simultaneous Machine Translation [8.1299957975257]
Simultaneous Machine Translation (SiMT) requires target tokens to be generated in real-time as streaming source tokens are consumed.
We propose PsFuture, the first zero-shot adaptive read/write policy for SiMT.
We introduce a novel training strategy, Prefix-to-Full (P2F), specifically tailored to adjust offline translation models for SiMT applications.
arXiv Detail & Related papers (2024-10-05T08:06:33Z) - Language Model is a Branch Predictor for Simultaneous Machine
Translation [73.82754138171587]
We propose incorporating branch prediction techniques in SiMT tasks to reduce translation latency.
We utilize a language model as a branch predictor to predict potential branch directions.
When the actual source word deviates from the predicted source word, we use the real source word to decode the output again, replacing the predicted output.
arXiv Detail & Related papers (2023-12-22T07:32:47Z) - CBSiMT: Mitigating Hallucination in Simultaneous Machine Translation
with Weighted Prefix-to-Prefix Training [13.462260072313894]
Simultaneous machine translation (SiMT) is a challenging task that requires starting translation before the full source sentence is available.
Prefix-to- framework is often applied to SiMT, which learns to predict target tokens using only a partial source prefix.
We propose a Confidence-Based Simultaneous Machine Translation framework, which uses model confidence to perceive hallucination tokens.
arXiv Detail & Related papers (2023-11-07T02:44:45Z) - LEAPT: Learning Adaptive Prefix-to-prefix Translation For Simultaneous
Machine Translation [6.411228564798412]
Simultaneous machine translation is useful in many live scenarios but very challenging due to the trade-off between accuracy and latency.
We propose a novel adaptive training policy called LEAPT, which allows our machine translation model to learn how to translate source prefixes and make use of the future context.
arXiv Detail & Related papers (2023-03-21T11:17:37Z) - Masked Autoencoders As The Unified Learners For Pre-Trained Sentence
Representation [77.47617360812023]
We extend the recently proposed MAE style pre-training strategy, RetroMAE, to support a wide variety of sentence representation tasks.
The first stage performs RetroMAE over generic corpora, like Wikipedia, BookCorpus, etc., from which the base model is learned.
The second stage takes place on domain-specific data, e.g., MS MARCO and NLI, where the base model is continuingly trained based on RetroMAE and contrastive learning.
arXiv Detail & Related papers (2022-07-30T14:34:55Z) - Understanding and Improving Sequence-to-Sequence Pretraining for Neural
Machine Translation [48.50842995206353]
We study the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoder-based pretraining approaches for NMT.
We propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies.
arXiv Detail & Related papers (2022-03-16T07:36:28Z) - Source and Target Bidirectional Knowledge Distillation for End-to-end
Speech Translation [88.78138830698173]
We focus on sequence-level knowledge distillation (SeqKD) from external text-based NMT models.
We train a bilingual E2E-ST model to predict paraphrased transcriptions as an auxiliary task with a single decoder.
arXiv Detail & Related papers (2021-04-13T19:00:51Z) - Learning Contextualized Sentence Representations for Document-Level
Neural Machine Translation [59.191079800436114]
Document-level machine translation incorporates inter-sentential dependencies into the translation of a source sentence.
We propose a new framework to model cross-sentence dependencies by training neural machine translation (NMT) to predict both the target translation and surrounding sentences of a source sentence.
arXiv Detail & Related papers (2020-03-30T03:38:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.