TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech
Recognition
- URL: http://arxiv.org/abs/2104.01522v1
- Date: Sun, 4 Apr 2021 02:34:55 GMT
- Title: TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech
Recognition
- Authors: Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Shuai Zhang, Zhengqi
Wen, Xuefei Liu
- Abstract summary: The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire output tokens in at least one step.
To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT)
The results show that the TSNAT can achieve a competitive performance with the AR model and outperform many complicated NAR models.
- Score: 69.68154370877615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The autoregressive (AR) models, such as attention-based encoder-decoder
models and RNN-Transducer, have achieved great success in speech recognition.
They predict the output sequence conditioned on the previous tokens and
acoustic encoded states, which is inefficient on GPUs. The non-autoregressive
(NAR) models can get rid of the temporal dependency between the output tokens
and predict the entire output tokens in at least one step. However, the NAR
model still faces two major problems. On the one hand, there is still a great
gap in performance between the NAR models and the advanced AR models. On the
other hand, it's difficult for most of the NAR models to train and converge. To
address these two problems, we propose a new model named the two-step
non-autoregressive transformer(TSNAT), which improves the performance and
accelerating the convergence of the NAR model by learning prior knowledge from
a parameters-sharing AR model. Furthermore, we introduce the two-stage method
into the inference process, which improves the model performance greatly. All
the experiments are conducted on a public Chinese mandarin dataset ASIEHLL-1.
The results show that the TSNAT can achieve a competitive performance with the
AR model and outperform many complicated NAR models.
Related papers
- Leveraging Diverse Modeling Contexts with Collaborating Learning for
Neural Machine Translation [26.823126615724888]
Autoregressive (AR) and Non-autoregressive (NAR) models are two types of generative models for Neural Machine Translation (NMT)
We propose a novel generic collaborative learning method, DCMCL, where AR and NAR models are treated as collaborators instead of teachers and students.
arXiv Detail & Related papers (2024-02-28T15:55:02Z) - Paraformer: Fast and Accurate Parallel Transformer for
Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer.
It accurately predicts the number of output tokens and extract hidden variables.
It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z) - Non-Autoregressive Machine Translation: It's Not as Fast as it Seems [84.47091735503979]
We point out flaws in the evaluation methodology present in the literature on NAR models.
We compare NAR models with other widely used methods for improving efficiency.
We call for more realistic and extensive evaluation of NAR models in future work.
arXiv Detail & Related papers (2022-05-04T09:30:17Z) - Diformer: Directional Transformer for Neural Machine Translation [13.867255817435705]
Autoregressive (AR) and Non-autoregressive (NAR) models have their own superiority on the performance and latency.
We propose the Directional Transformer (Diformer) by jointly modelling AR and NAR into three generation directions.
Experiments on 4 WMT benchmarks demonstrate that Diformer outperforms current united-modelling works with more than 1.5 BLEU points for both AR and NAR decoding.
arXiv Detail & Related papers (2021-12-22T02:35:29Z) - An EM Approach to Non-autoregressive Conditional Sequence Generation [49.11858479436565]
Autoregressive (AR) models have been the dominating approach to conditional sequence generation.
Non-autoregressive (NAR) models have been recently proposed to reduce the latency by generating all output tokens in parallel.
This paper proposes a new approach that jointly optimize both AR and NAR models in a unified Expectation-Maximization framework.
arXiv Detail & Related papers (2020-06-29T20:58:57Z) - A Study of Non-autoregressive Model for Sequence Generation [147.89525760170923]
Non-autoregressive (NAR) models generate all the tokens of a sequence in parallel.
We propose knowledge distillation and source-target alignment to bridge the gap between AR and NAR models.
arXiv Detail & Related papers (2020-04-22T09:16:09Z) - LAVA NAT: A Non-Autoregressive Translation Model with Look-Around
Decoding and Vocabulary Attention [54.18121922040521]
Non-autoregressive translation (NAT) models generate multiple tokens in one forward pass.
These NAT models often suffer from the multimodality problem, generating duplicated tokens or missing tokens.
We propose two novel methods to address this issue, the Look-Around (LA) strategy and the Vocabulary Attention (VA) mechanism.
arXiv Detail & Related papers (2020-02-08T04:11:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.