Instantaneous Grammatical Error Correction with Shallow Aggressive
Decoding
- URL: http://arxiv.org/abs/2106.04970v1
- Date: Wed, 9 Jun 2021 10:30:59 GMT
- Title: Instantaneous Grammatical Error Correction with Shallow Aggressive
Decoding
- Authors: Xin Sun, Tao Ge, Furu Wei, Houfeng Wang
- Abstract summary: We propose Shallow Aggressive Decoding (SAD) to improve the online inference efficiency of the Transformer for instantaneous Grammatical Error Correction (GEC)
SAD aggressively decodes as many tokens as possible in parallel instead of always decoding only one token in each step to improve computational parallelism.
Experiments in both English and Chinese GEC benchmarks show that aggressive decoding could yield the same predictions but with a significant speedup for online inference.
- Score: 57.08875260900373
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose Shallow Aggressive Decoding (SAD) to improve the
online inference efficiency of the Transformer for instantaneous Grammatical
Error Correction (GEC). SAD optimizes the online inference efficiency for GEC
by two innovations: 1) it aggressively decodes as many tokens as possible in
parallel instead of always decoding only one token in each step to improve
computational parallelism; 2) it uses a shallow decoder instead of the
conventional Transformer architecture with balanced encoder-decoder depth to
reduce the computational cost during inference. Experiments in both English and
Chinese GEC benchmarks show that aggressive decoding could yield the same
predictions as greedy decoding but with a significant speedup for online
inference. Its combination with the shallow decoder could offer an even higher
online inference speedup over the powerful Transformer baseline without quality
loss. Not only does our approach allow a single model to achieve the
state-of-the-art results in English GEC benchmarks: 66.4 F0.5 in the CoNLL-14
and 72.9 F0.5 in the BEA-19 test set with an almost 10x online inference
speedup over the Transformer-big model, but also it is easily adapted to other
languages. Our code is available at
https://github.com/AutoTemp/Shallow-Aggressive-Decoding.
Related papers
- Cerberus: Efficient Inference with Adaptive Parallel Decoding and Sequential Knowledge Enhancement [12.40683763019276]
Large language models (LLMs) often face a bottleneck in inference speed due to their reliance on auto-regressive decoding.
We have identified two key issues with existing parallel decoding frameworks.
We propose Cerberus, an adaptive parallel decoding framework.
arXiv Detail & Related papers (2024-10-17T08:55:18Z) - Accelerating Transformer Inference for Translation via Parallel Decoding [2.89306442817912]
Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT)
We present three parallel decoding algorithms and test them on different languages and models.
arXiv Detail & Related papers (2023-05-17T17:57:34Z) - Fast and parallel decoding for transducer [25.510837666148024]
We introduce a constrained version of transducer loss to learn strictly monotonic alignments between the sequences.
We also improve the standard greedy search and beam search algorithms by limiting the number of symbols that can be emitted per time step.
arXiv Detail & Related papers (2022-10-31T07:46:10Z) - Paraformer: Fast and Accurate Parallel Transformer for
Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer.
It accurately predicts the number of output tokens and extract hidden variables.
It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z) - Lossless Acceleration for Seq2seq Generation with Aggressive Decoding [74.12096349944497]
Aggressive Decoding is a novel decoding algorithm for seq2seq generation.
Our approach aims to yield identical (or better) generation compared with autoregressive decoding.
We test Aggressive Decoding on the most popular 6-layer Transformer model on GPU in multiple seq2seq tasks.
arXiv Detail & Related papers (2022-05-20T17:59:00Z) - Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with
Non-Autoregressive Hidden Intermediates [59.678108707409606]
We propose Fast-MD, a fast MD model that generates HI by non-autoregressive decoding based on connectionist temporal classification (CTC) outputs followed by an ASR decoder.
Fast-MD achieved about 2x and 4x faster decoding speed than that of the na"ive MD model on GPU and CPU with comparable translation quality.
arXiv Detail & Related papers (2021-09-27T05:21:30Z) - Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input [54.82369261350497]
We propose a CTC-enhanced NAR transformer, which generates target sequence by refining predictions of the CTC module.
Experimental results show that our method outperforms all previous NAR counterparts and achieves 50x faster decoding speed than a strong AR baseline with only 0.0 0.3 absolute CER degradation on Aishell-1 and Aishell-2 datasets.
arXiv Detail & Related papers (2020-10-28T15:00:09Z) - Glancing Transformer for Non-Autoregressive Neural Machine Translation [58.87258329683682]
We propose a method to learn word interdependency for single-pass parallel generation models.
With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup.
arXiv Detail & Related papers (2020-08-18T13:04:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.