Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
- URL: http://arxiv.org/abs/2205.10350v1
- Date: Fri, 20 May 2022 17:59:00 GMT
- Title: Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
- Authors: Tao Ge, Heming Xia, Xin Sun, Si-Qing Chen, Furu Wei
- Abstract summary: Aggressive Decoding is a novel decoding algorithm for seq2seq generation.
Our approach aims to yield identical (or better) generation compared with autoregressive decoding.
We test Aggressive Decoding on the most popular 6-layer Transformer model on GPU in multiple seq2seq tasks.
- Score: 74.12096349944497
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study lossless acceleration for seq2seq generation with a novel decoding
algorithm -- Aggressive Decoding. Unlike the previous efforts (e.g.,
non-autoregressive decoding) speeding up seq2seq generation at the cost of
quality loss, our approach aims to yield the identical (or better) generation
compared with autoregressive decoding but in a significant speedup, achieved by
innovative cooperation of aggressive decoding and verification that are both
efficient due to parallel computing.
We propose two Aggressive Decoding paradigms for 2 kinds of seq2seq tasks: 1)
For the seq2seq tasks whose inputs and outputs are highly similar (e.g.,
Grammatical Error Correction), we propose Input-guided Aggressive Decoding
(IAD) that aggressively copies from the input sentence as drafted decoded
tokens to verify in parallel; 2) For other general seq2seq tasks (e.g., Machine
Translation), we propose Generalized Aggressive Decoding (GAD) that first
employs an additional non-autoregressive decoding model for aggressive decoding
and then verifies in parallel in the autoregressive manner.
We test Aggressive Decoding on the most popular 6-layer Transformer model on
GPU in multiple seq2seq tasks: 1) For IAD, we show that it can introduce a
7x-9x speedup for the Transformer in Grammatical Error Correction and Text
Simplification tasks with the identical results as greedy decoding; 2) For GAD,
we observe a 3x-5x speedup with the identical or even better quality in two
important seq2seq tasks: Machine Translation and Abstractive Summarization.
Moreover, Aggressive Decoding can benefit even more from stronger computing
devices that are better at parallel computing. Given the lossless quality as
well as significant and promising speedup, we believe Aggressive Decoding may
potentially evolve into a de facto standard for efficient and lossless seq2seq
generation in the near future.
Related papers
- Cerberus: Efficient Inference with Adaptive Parallel Decoding and Sequential Knowledge Enhancement [12.40683763019276]
Large language models (LLMs) often face a bottleneck in inference speed due to their reliance on auto-regressive decoding.
We have identified two key issues with existing parallel decoding frameworks.
We propose Cerberus, an adaptive parallel decoding framework.
arXiv Detail & Related papers (2024-10-17T08:55:18Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z) - Accelerating Transformer Inference for Translation via Parallel Decoding [2.89306442817912]
Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT)
We present three parallel decoding algorithms and test them on different languages and models.
arXiv Detail & Related papers (2023-05-17T17:57:34Z) - NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual
Question Answering [52.10214317661547]
Current numerical reasoning methods autoregressively decode program sequences.
The accuracy of program generation drops sharply as the decoding steps unfold due to error propagation.
In this paper, we propose a non-autoregressive program generation framework.
arXiv Detail & Related papers (2022-11-07T11:25:21Z) - Trans-Encoder: Unsupervised sentence-pair modelling through self- and
mutual-distillations [22.40667024030858]
Bi-encoders produce fixed-dimensional sentence representations and are computationally efficient.
Cross-encoders can leverage their attention heads to exploit inter-sentence interactions for better performance.
Trans-Encoder combines the two learning paradigms into an iterative joint framework to simultaneously learn enhanced bi- and cross-encoders.
arXiv Detail & Related papers (2021-09-27T14:06:47Z) - Instantaneous Grammatical Error Correction with Shallow Aggressive
Decoding [57.08875260900373]
We propose Shallow Aggressive Decoding (SAD) to improve the online inference efficiency of the Transformer for instantaneous Grammatical Error Correction (GEC)
SAD aggressively decodes as many tokens as possible in parallel instead of always decoding only one token in each step to improve computational parallelism.
Experiments in both English and Chinese GEC benchmarks show that aggressive decoding could yield the same predictions but with a significant speedup for online inference.
arXiv Detail & Related papers (2021-06-09T10:30:59Z) - Fast Interleaved Bidirectional Sequence Generation [90.58793284654692]
We introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously.
We show that we can easily convert a standard architecture for unidirectional decoding into a bidirectional decoder.
Our interleaved bidirectional decoder (IBDecoder) retains the model simplicity and training efficiency of the standard Transformer.
arXiv Detail & Related papers (2020-10-27T17:38:51Z) - Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine
Translation [78.51887060865273]
We show that a single-layer autoregressive decoder can substantially outperform strong non-autoregressive models with comparable inference speed.
Our results establish a new protocol for future research toward fast, accurate machine translation.
arXiv Detail & Related papers (2020-06-18T09:06:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.