Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine
Translation
- URL: http://arxiv.org/abs/2006.10369v4
- Date: Thu, 24 Jun 2021 21:46:03 GMT
- Title: Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine
Translation
- Authors: Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, Noah A. Smith
- Abstract summary: We show that a single-layer autoregressive decoder can substantially outperform strong non-autoregressive models with comparable inference speed.
Our results establish a new protocol for future research toward fast, accurate machine translation.
- Score: 78.51887060865273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Much recent effort has been invested in non-autoregressive neural machine
translation, which appears to be an efficient alternative to state-of-the-art
autoregressive machine translation on modern GPUs. In contrast to the latter,
where generation is sequential, the former allows generation to be parallelized
across target token positions. Some of the latest non-autoregressive models
have achieved impressive translation quality-speed tradeoffs compared to
autoregressive baselines. In this work, we reexamine this tradeoff and argue
that autoregressive baselines can be substantially sped up without loss in
accuracy. Specifically, we study autoregressive models with encoders and
decoders of varied depths. Our extensive experiments show that given a
sufficiently deep encoder, a single-layer autoregressive decoder can
substantially outperform strong non-autoregressive models with comparable
inference speed. We show that the speed disadvantage for autoregressive
baselines compared to non-autoregressive methods has been overestimated in
three aspects: suboptimal layer allocation, insufficient speed measurement, and
lack of knowledge distillation. Our results establish a new protocol for future
research toward fast, accurate machine translation. Our code is available at
https://github.com/jungokasai/deep-shallow.
Related papers
- Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - Lossless Acceleration for Seq2seq Generation with Aggressive Decoding [74.12096349944497]
Aggressive Decoding is a novel decoding algorithm for seq2seq generation.
Our approach aims to yield identical (or better) generation compared with autoregressive decoding.
We test Aggressive Decoding on the most popular 6-layer Transformer model on GPU in multiple seq2seq tasks.
arXiv Detail & Related papers (2022-05-20T17:59:00Z) - Non-Autoregressive Translation with Layer-Wise Prediction and Deep
Supervision [33.04082398101807]
Existing neural machine translation models, such as Transformer, achieve high performance, but they decode words one by one, which is inefficient.
Recent non-autoregressive translation models speed up the inference, but their quality is still inferior.
We propose DSLP, a highly efficient and high-performance model for machine translation.
arXiv Detail & Related papers (2021-10-14T16:36:12Z) - PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text
Recognition [16.976881696357275]
We propose a Parallel, Iterative and Mimicking Network (PIMNet) to balance accuracy and efficiency.
PIMNet adopts a parallel attention mechanism to predict the text faster and an iterative generation mechanism to make the predictions more accurate.
arXiv Detail & Related papers (2021-09-09T10:11:07Z) - Non-Autoregressive Translation by Learning Target Categorical Codes [59.840510037250944]
We propose CNAT, which learns implicitly categorical codes as latent variables into the non-autoregressive decoding.
Experiment results show that our model achieves comparable or better performance in machine translation tasks.
arXiv Detail & Related papers (2021-03-21T14:12:34Z) - Enriching Non-Autoregressive Transformer with Syntactic and
SemanticStructures for Neural Machine Translation [54.864148836486166]
We propose to incorporate the explicit syntactic and semantic structures of languages into a non-autoregressive Transformer.
Our model achieves a significantly faster speed, as well as keeps the translation quality when compared with several state-of-the-art non-autoregressive models.
arXiv Detail & Related papers (2021-01-22T04:12:17Z) - Cascaded Text Generation with Markov Transformers [122.76100449018061]
Two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies.
This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output.
This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.
arXiv Detail & Related papers (2020-06-01T17:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.