Related papers: A Study of Non-autoregressive Model for Sequence Generation

A Study of Non-autoregressive Model for Sequence Generation

URL: http://arxiv.org/abs/2004.10454v2
Date: Mon, 11 May 2020 00:17:11 GMT
Title: A Study of Non-autoregressive Model for Sequence Generation
Authors: Yi Ren, Jinglin Liu, Xu Tan, Zhou Zhao, Sheng Zhao, Tie-Yan Liu
Abstract summary: Non-autoregressive (NAR) models generate all the tokens of a sequence in parallel. We propose knowledge distillation and source-target alignment to bridge the gap between AR and NAR models.
Score: 147.89525760170923
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Non-autoregressive (NAR) models generate all the tokens of a sequence in parallel, resulting in faster generation speed compared to their autoregressive (AR) counterparts but at the cost of lower accuracy. Different techniques including knowledge distillation and source-target alignment have been proposed to bridge the gap between AR and NAR models in various tasks such as neural machine translation (NMT), automatic speech recognition (ASR), and text to speech (TTS). With the help of those techniques, NAR models can catch up with the accuracy of AR models in some tasks but not in some others. In this work, we conduct a study to understand the difficulty of NAR sequence generation and try to answer: (1) Why NAR models can catch up with AR models in some tasks but not all? (2) Why techniques like knowledge distillation and source-target alignment can help NAR models. Since the main difference between AR and NAR models is that NAR models do not use dependency among target tokens while AR models do, intuitively the difficulty of NAR sequence generation heavily depends on the strongness of dependency among target tokens. To quantify such dependency, we propose an analysis model called CoMMA to characterize the difficulty of different NAR sequence generation tasks. We have several interesting findings: 1) Among the NMT, ASR and TTS tasks, ASR has the most target-token dependency while TTS has the least. 2) Knowledge distillation reduces the target-token dependency in target sequence and thus improves the accuracy of NAR models. 3) Source-target alignment constraint encourages dependency of a target token on source tokens and thus eases the training of NAR models.

Related papers

Leveraging Diverse Modeling Contexts with Collaborating Learning for Neural Machine Translation [26.823126615724888]
Autoregressive (AR) and Non-autoregressive (NAR) models are two types of generative models for Neural Machine Translation (NMT) We propose a novel generic collaborative learning method, DCMCL, where AR and NAR models are treated as collaborators instead of teachers and students.
arXiv Detail & Related papers (2024-02-28T15:55:02Z)
Semi-Autoregressive Streaming ASR With Label Context [70.76222767090638]
We propose a streaming "semi-autoregressive" ASR model that incorporates the labels emitted in previous blocks as additional context. Experiments show that our method outperforms the existing streaming NAR model by 19% relative on Tedlium2, 16%/8% on Librispeech-100 clean/other test sets, and 19%/8% on the Switchboard(SWB)/Callhome(CH) test sets.
arXiv Detail & Related papers (2023-09-19T20:55:58Z)
Helping the Weak Makes You Strong: Simple Multi-Task Learning Improves Non-Autoregressive Translators [35.939982651768666]
Probability framework of NAR models requires conditional independence assumption on target sequences. We propose a simple and model-agnostic multi-task learning framework to provide more informative learning signals. Our approach can consistently improve accuracy of multiple NAR baselines without adding any additional decoding overhead.
arXiv Detail & Related papers (2022-11-11T09:10:14Z)
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer. It accurately predicts the number of output tokens and extract hidden variables. It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z)
Non-Autoregressive Machine Translation: It's Not as Fast as it Seems [84.47091735503979]
We point out flaws in the evaluation methodology present in the literature on NAR models. We compare NAR models with other widely used methods for improving efficiency. We call for more realistic and extensive evaluation of NAR models in future work.
arXiv Detail & Related papers (2022-05-04T09:30:17Z)
Diformer: Directional Transformer for Neural Machine Translation [13.867255817435705]
Autoregressive (AR) and Non-autoregressive (NAR) models have their own superiority on the performance and latency. We propose the Directional Transformer (Diformer) by jointly modelling AR and NAR into three generation directions. Experiments on 4 WMT benchmarks demonstrate that Diformer outperforms current united-modelling works with more than 1.5 BLEU points for both AR and NAR decoding.
arXiv Detail & Related papers (2021-12-22T02:35:29Z)
A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation [59.64193903397301]
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines. We conduct a comparative study of various NAR modeling methods for end-to-end automatic speech recognition (ASR) The results on various tasks provide interesting findings for developing an understanding of NAR ASR, such as the accuracy-speed trade-off and robustness against long-form utterances.
arXiv Detail & Related papers (2021-10-11T13:05:06Z)
TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition [69.68154370877615]
The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire output tokens in at least one step. To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT) The results show that the TSNAT can achieve a competitive performance with the AR model and outperform many complicated NAR models.
arXiv Detail & Related papers (2021-04-04T02:34:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.