Glancing Transformer for Non-Autoregressive Neural Machine Translation
- URL: http://arxiv.org/abs/2008.07905v3
- Date: Thu, 13 May 2021 14:41:40 GMT
- Title: Glancing Transformer for Non-Autoregressive Neural Machine Translation
- Authors: Lihua Qian, Hao Zhou, Yu Bao, Mingxuan Wang, Lin Qiu, Weinan Zhang,
Yong Yu, Lei Li
- Abstract summary: We propose a method to learn word interdependency for single-pass parallel generation models.
With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup.
- Score: 58.87258329683682
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work on non-autoregressive neural machine translation (NAT) aims at
improving the efficiency by parallel decoding without sacrificing the quality.
However, existing NAT methods are either inferior to Transformer or require
multiple decoding passes, leading to reduced speedup. We propose the Glancing
Language Model (GLM), a method to learn word interdependency for single-pass
parallel generation models. With GLM, we develop Glancing Transformer (GLAT)
for machine translation. With only single-pass parallel decoding, GLAT is able
to generate high-quality translation with 8-15 times speedup. Experiments on
multiple WMT language directions show that GLAT outperforms all previous single
pass non-autoregressive methods, and is nearly comparable to Transformer,
reducing the gap to 0.25-0.9 BLEU points.
Related papers
- Quick Back-Translation for Unsupervised Machine Translation [9.51657235413336]
We propose a two-for-one improvement to Transformer back-translation: Quick Back-Translation (QBT)
QBT re-purposes the encoder as a generative model, and uses encoder-generated sequences to train the decoder.
Experiments on various WMT benchmarks demonstrate that QBT dramatically outperforms standard back-translation only method in terms of training efficiency.
arXiv Detail & Related papers (2023-12-01T20:27:42Z) - TranSFormer: Slow-Fast Transformer for Machine Translation [52.12212173775029]
We present a textbfSlow-textbfFast two-stream learning model, referred to as TrantextbfSFormer.
Our TranSFormer shows consistent BLEU improvements (larger than 1 BLEU point) on several machine translation benchmarks.
arXiv Detail & Related papers (2023-05-26T14:37:38Z) - Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order.
In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z) - Accelerating Transformer Inference for Translation via Parallel Decoding [2.89306442817912]
Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT)
We present three parallel decoding algorithms and test them on different languages and models.
arXiv Detail & Related papers (2023-05-17T17:57:34Z) - Directed Acyclic Transformer for Non-Autoregressive Machine Translation [93.31114105366461]
Directed Acyclic Transfomer (DA-Transformer) represents hidden states in a Directed Acyclic Graph (DAG)
DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average.
arXiv Detail & Related papers (2022-05-16T06:02:29Z) - Instantaneous Grammatical Error Correction with Shallow Aggressive
Decoding [57.08875260900373]
We propose Shallow Aggressive Decoding (SAD) to improve the online inference efficiency of the Transformer for instantaneous Grammatical Error Correction (GEC)
SAD aggressively decodes as many tokens as possible in parallel instead of always decoding only one token in each step to improve computational parallelism.
Experiments in both English and Chinese GEC benchmarks show that aggressive decoding could yield the same predictions but with a significant speedup for online inference.
arXiv Detail & Related papers (2021-06-09T10:30:59Z) - Incorporating a Local Translation Mechanism into Non-autoregressive
Translation [28.678752678905244]
We introduce a novel local autoregressive translation mechanism into non-autoregressive translation (NAT) models.
For each target decoding position, instead of only one token, we predict a short sequence of tokens in an autoregressive way.
We design an efficient merging algorithm to align and merge the out-put pieces into one final output sequence.
arXiv Detail & Related papers (2020-11-12T00:32:51Z) - Cascaded Text Generation with Markov Transformers [122.76100449018061]
Two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies.
This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output.
This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.
arXiv Detail & Related papers (2020-06-01T17:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.