Future-Guided Incremental Transformer for Simultaneous Translation
- URL: http://arxiv.org/abs/2012.12465v1
- Date: Wed, 23 Dec 2020 03:04:49 GMT
- Title: Future-Guided Incremental Transformer for Simultaneous Translation
- Authors: Shaolei Zhang, Yang Feng, Liangyou Li
- Abstract summary: Simultaneous translation (ST) starts synchronously while reading source sentences, and is used in many online scenarios.
wait-k policy faces two weaknesses: low training speed caused by the recalculation of hidden states and lack of future source information to guide training.
We propose an incremental Transformer with an average embedding layer (AEL) to accelerate the speed of calculation of hidden states.
- Score: 6.8452940299620435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simultaneous translation (ST) starts translations synchronously while reading
source sentences, and is used in many online scenarios. The previous wait-k
policy is concise and achieved good results in ST. However, wait-k policy faces
two weaknesses: low training speed caused by the recalculation of hidden states
and lack of future source information to guide training. For the low training
speed, we propose an incremental Transformer with an average embedding layer
(AEL) to accelerate the speed of calculation of the hidden states during
training. For future-guided training, we propose a conventional Transformer as
the teacher of the incremental Transformer, and try to invisibly embed some
future information in the model through knowledge distillation. We conducted
experiments on Chinese-English and German-English simultaneous translation
tasks and compared with the wait-k policy to evaluate the proposed method. Our
method can effectively increase the training speed by about 28 times on average
at different k and implicitly embed some predictive abilities in the model,
achieving better translation quality than wait-k baseline.
Related papers
- Language Model is a Branch Predictor for Simultaneous Machine
Translation [73.82754138171587]
We propose incorporating branch prediction techniques in SiMT tasks to reduce translation latency.
We utilize a language model as a branch predictor to predict potential branch directions.
When the actual source word deviates from the predicted source word, we use the real source word to decode the output again, replacing the predicted output.
arXiv Detail & Related papers (2023-12-22T07:32:47Z) - Context Consistency between Training and Testing in Simultaneous Machine
Translation [46.38890241793453]
Simultaneous Machine Translation (SiMT) aims to yield a real-time partial translation with a monotonically growing the source-side context.
There is a counterintuitive phenomenon about the context usage between training and testing.
We propose an effective training approach called context consistency training accordingly.
arXiv Detail & Related papers (2023-11-13T04:11:32Z) - LEAPT: Learning Adaptive Prefix-to-prefix Translation For Simultaneous
Machine Translation [6.411228564798412]
Simultaneous machine translation is useful in many live scenarios but very challenging due to the trade-off between accuracy and latency.
We propose a novel adaptive training policy called LEAPT, which allows our machine translation model to learn how to translate source prefixes and make use of the future context.
arXiv Detail & Related papers (2023-03-21T11:17:37Z) - Data-Driven Adaptive Simultaneous Machine Translation [51.01779863078624]
We propose a novel and efficient training scheme for adaptive SimulMT.
Our method outperforms all strong baselines in terms of translation quality and latency.
arXiv Detail & Related papers (2022-04-27T02:40:21Z) - Anticipation-free Training for Simultaneous Translation [70.85761141178597]
Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available.
Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.
We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
arXiv Detail & Related papers (2022-01-30T16:29:37Z) - Modelling Latent Translations for Cross-Lingual Transfer [47.61502999819699]
We propose a new technique that integrates both steps of the traditional pipeline (translation and classification) into a single model.
We evaluate our novel latent translation-based model on a series of multilingual NLU tasks.
We report gains for both zero-shot and few-shot learning setups, up to 2.7 accuracy points on average.
arXiv Detail & Related papers (2021-07-23T17:11:27Z) - Progressive Multi-Granularity Training for Non-Autoregressive
Translation [98.11249019844281]
Non-autoregressive translation (NAT) significantly accelerates the inference process via predicting the entire target sequence.
Recent studies show that NAT is weak at learning high-mode of knowledge such as one-to-many translations.
We argue that modes can be divided into various granularities which can be learned from easy to hard.
arXiv Detail & Related papers (2021-06-10T07:16:07Z) - Few-shot learning through contextual data augmentation [74.20290390065475]
Machine translation models need to adapt to new data to maintain their performance over time.
We show that adaptation on the scale of one to five examples is possible.
Our model reports better accuracy scores than a reference system trained with on average 313 parallel examples.
arXiv Detail & Related papers (2021-03-31T09:05:43Z) - Accelerating Training of Transformer-Based Language Models with
Progressive Layer Dropping [24.547833264405355]
The proposed method achieves a 24% time reduction on average per sample and allows the pre-training to be 2.5 times faster than the baseline.
While being faster, our pre-trained models are equipped with strong knowledge transferability, achieving comparable and sometimes higher GLUE score than the baseline.
arXiv Detail & Related papers (2020-10-26T06:50:07Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.