A Study of Non-autoregressive Model for Sequence Generation
- URL: http://arxiv.org/abs/2004.10454v2
- Date: Mon, 11 May 2020 00:17:11 GMT
- Title: A Study of Non-autoregressive Model for Sequence Generation
- Authors: Yi Ren, Jinglin Liu, Xu Tan, Zhou Zhao, Sheng Zhao, Tie-Yan Liu
- Abstract summary: Non-autoregressive (NAR) models generate all the tokens of a sequence in parallel.
We propose knowledge distillation and source-target alignment to bridge the gap between AR and NAR models.
- Score: 147.89525760170923
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-autoregressive (NAR) models generate all the tokens of a sequence in
parallel, resulting in faster generation speed compared to their autoregressive
(AR) counterparts but at the cost of lower accuracy. Different techniques
including knowledge distillation and source-target alignment have been proposed
to bridge the gap between AR and NAR models in various tasks such as neural
machine translation (NMT), automatic speech recognition (ASR), and text to
speech (TTS). With the help of those techniques, NAR models can catch up with
the accuracy of AR models in some tasks but not in some others. In this work,
we conduct a study to understand the difficulty of NAR sequence generation and
try to answer: (1) Why NAR models can catch up with AR models in some tasks but
not all? (2) Why techniques like knowledge distillation and source-target
alignment can help NAR models. Since the main difference between AR and NAR
models is that NAR models do not use dependency among target tokens while AR
models do, intuitively the difficulty of NAR sequence generation heavily
depends on the strongness of dependency among target tokens. To quantify such
dependency, we propose an analysis model called CoMMA to characterize the
difficulty of different NAR sequence generation tasks. We have several
interesting findings: 1) Among the NMT, ASR and TTS tasks, ASR has the most
target-token dependency while TTS has the least. 2) Knowledge distillation
reduces the target-token dependency in target sequence and thus improves the
accuracy of NAR models. 3) Source-target alignment constraint encourages
dependency of a target token on source tokens and thus eases the training of
NAR models.
Related papers
- Leveraging Diverse Modeling Contexts with Collaborating Learning for
Neural Machine Translation [26.823126615724888]
Autoregressive (AR) and Non-autoregressive (NAR) models are two types of generative models for Neural Machine Translation (NMT)
We propose a novel generic collaborative learning method, DCMCL, where AR and NAR models are treated as collaborators instead of teachers and students.
arXiv Detail & Related papers (2024-02-28T15:55:02Z) - Semi-Autoregressive Streaming ASR With Label Context [70.76222767090638]
We propose a streaming "semi-autoregressive" ASR model that incorporates the labels emitted in previous blocks as additional context.
Experiments show that our method outperforms the existing streaming NAR model by 19% relative on Tedlium2, 16%/8% on Librispeech-100 clean/other test sets, and 19%/8% on the Switchboard(SWB)/Callhome(CH) test sets.
arXiv Detail & Related papers (2023-09-19T20:55:58Z) - Helping the Weak Makes You Strong: Simple Multi-Task Learning Improves
Non-Autoregressive Translators [35.939982651768666]
Probability framework of NAR models requires conditional independence assumption on target sequences.
We propose a simple and model-agnostic multi-task learning framework to provide more informative learning signals.
Our approach can consistently improve accuracy of multiple NAR baselines without adding any additional decoding overhead.
arXiv Detail & Related papers (2022-11-11T09:10:14Z) - Paraformer: Fast and Accurate Parallel Transformer for
Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer.
It accurately predicts the number of output tokens and extract hidden variables.
It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z) - Non-Autoregressive Machine Translation: It's Not as Fast as it Seems [84.47091735503979]
We point out flaws in the evaluation methodology present in the literature on NAR models.
We compare NAR models with other widely used methods for improving efficiency.
We call for more realistic and extensive evaluation of NAR models in future work.
arXiv Detail & Related papers (2022-05-04T09:30:17Z) - Diformer: Directional Transformer for Neural Machine Translation [13.867255817435705]
Autoregressive (AR) and Non-autoregressive (NAR) models have their own superiority on the performance and latency.
We propose the Directional Transformer (Diformer) by jointly modelling AR and NAR into three generation directions.
Experiments on 4 WMT benchmarks demonstrate that Diformer outperforms current united-modelling works with more than 1.5 BLEU points for both AR and NAR decoding.
arXiv Detail & Related papers (2021-12-22T02:35:29Z) - A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text
Generation [59.64193903397301]
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.
We conduct a comparative study of various NAR modeling methods for end-to-end automatic speech recognition (ASR)
The results on various tasks provide interesting findings for developing an understanding of NAR ASR, such as the accuracy-speed trade-off and robustness against long-form utterances.
arXiv Detail & Related papers (2021-10-11T13:05:06Z) - TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech
Recognition [69.68154370877615]
The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire output tokens in at least one step.
To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT)
The results show that the TSNAT can achieve a competitive performance with the AR model and outperform many complicated NAR models.
arXiv Detail & Related papers (2021-04-04T02:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.