Efficient Wait-k Models for Simultaneous Machine Translation
- URL: http://arxiv.org/abs/2005.08595v2
- Date: Tue, 4 Aug 2020 01:10:35 GMT
- Title: Efficient Wait-k Models for Simultaneous Machine Translation
- Authors: Maha Elbayad, Laurent Besacier, Jakob Verbeek
- Abstract summary: Simultaneous machine translation consists in starting output generation before the entire input sequence is available.
Wait-k decoders offer a simple but efficient approach for this problem.
We investigate the behavior of wait-k decoding in low resource settings for spoken corpora using IWSLT datasets.
- Score: 46.01342928010307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simultaneous machine translation consists in starting output generation
before the entire input sequence is available. Wait-k decoders offer a simple
but efficient approach for this problem. They first read k source tokens, after
which they alternate between producing a target token and reading another
source token. We investigate the behavior of wait-k decoding in low resource
settings for spoken corpora using IWSLT datasets. We improve training of these
models using unidirectional encoders, and training across multiple values of k.
Experiments with Transformer and 2D-convolutional architectures show that our
wait-k models generalize well across a wide range of latency levels. We also
show that the 2D-convolution architecture is competitive with Transformers for
simultaneous translation of spoken language.
Related papers
- AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration [0.3626013617212667]
We introduce AMUSD (Asynchronous Multi-device Speculative Decoding), a system that accelerates generation by decoupling the draft and verify phases.
Unlike conventional speculative decoding, where only one model (draft or verify) performs token generation at a time, AMUSD enables both models to perform predictions independently on separate devices.
We evaluate our approach over multiple datasets and show that AMUSD achieves an average 29% improvement over speculative decoding and up to 1.96$times$ speedup over conventional autoregressive decoding.
arXiv Detail & Related papers (2024-10-22T19:15:35Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - All in One: Exploring Unified Video-Language Pre-training [44.22059872694995]
We introduce an end-to-end video-language model, namely textitall-in-one Transformer, that embeds raw video and textual signals into joint representations.
The code and pretrained model have been released in https://github.com/showlab/all-in-one.
arXiv Detail & Related papers (2022-03-14T17:06:30Z) - Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model.
We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder.
We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z) - Dual-decoder Transformer for Joint Automatic Speech Recognition and
Multilingual Speech Translation [71.54816893482457]
We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST)
Our models are based on the original Transformer architecture but consist of two decoders, each responsible for one task (ASR or ST)
arXiv Detail & Related papers (2020-11-02T04:59:50Z) - Neural Simultaneous Speech Translation Using Alignment-Based Chunking [4.224809458327515]
In simultaneous machine translation, the objective is to determine when to produce a partial translation given a continuous stream of source words.
We propose a neural machine translation (NMT) model that makes dynamic decisions when to continue feeding on input or generate output words.
Our results on the IWSLT 2020 English-to-German task outperform a wait-k baseline by 2.6 to 3.7% BLEU absolute.
arXiv Detail & Related papers (2020-05-29T10:20:48Z) - Non-Autoregressive Machine Translation with Disentangled Context
Transformer [70.95181466892795]
State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens.
We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts.
Our model achieves competitive, if not better, performance compared to the state of the art in non-autoregressive machine translation while significantly reducing decoding time on average.
arXiv Detail & Related papers (2020-01-15T05:32:18Z) - Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task.
Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.