Non-autoregressive Transformer with Unified Bidirectional Decoder for
Automatic Speech Recognition
- URL: http://arxiv.org/abs/2109.06684v1
- Date: Tue, 14 Sep 2021 13:39:39 GMT
- Title: Non-autoregressive Transformer with Unified Bidirectional Decoder for
Automatic Speech Recognition
- Authors: Chuan-Fei Zhang, Yan Liu, Tian-Hao Zhang, Song-Lu Chen, Feng Chen,
Xu-Cheng Yin
- Abstract summary: We propose a new non-autoregressive transformer with a unified decoder (NAT-UBD)
NAT-UBD can achieve character error rates (CERs) of 5.0%/5.5% on the Aishell1 dev/test sets, outperforming all previous NAR transformer models.
- Score: 20.93536420298548
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Non-autoregressive (NAR) transformer models have been studied intensively in
automatic speech recognition (ASR), and a substantial part of NAR transformer
models is to use the casual mask to limit token dependencies. However, the
casual mask is designed for the left-to-right decoding process of the
non-parallel autoregressive (AR) transformer, which is inappropriate for the
parallel NAR transformer since it ignores the right-to-left contexts. Some
models are proposed to utilize right-to-left contexts with an extra decoder,
but these methods increase the model complexity. To tackle the above problems,
we propose a new non-autoregressive transformer with a unified bidirectional
decoder (NAT-UBD), which can simultaneously utilize left-to-right and
right-to-left contexts. However, direct use of bidirectional contexts will
cause information leakage, which means the decoder output can be affected by
the character information from the input of the same position. To avoid
information leakage, we propose a novel attention mask and modify vanilla
queries, keys, and values matrices for NAT-UBD. Experimental results verify
that NAT-UBD can achieve character error rates (CERs) of 5.0%/5.5% on the
Aishell1 dev/test sets, outperforming all previous NAR transformer models.
Moreover, NAT-UBD can run 49.8x faster than the AR transformer baseline when
decoding in a single step.
Related papers
- Spike-driven Transformer [31.931401322707995]
Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i.e., spike-driven) paradigm.
In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties.
It is shown that the Spike-driven Transformer can achieve 77.1% top-1 accuracy on ImageNet-1K, which is the state-of-the-art result in the SNN field.
arXiv Detail & Related papers (2023-07-04T13:00:18Z) - Deep Transformers without Shortcuts: Modifying Self-attention for
Faithful Signal Propagation [105.22961467028234]
Skip connections and normalisation layers are ubiquitous for the training of Deep Neural Networks (DNNs)
Recent approaches such as Deep Kernel Shaping have made progress towards reducing our reliance on them.
But these approaches are incompatible with the self-attention layers present in transformers.
arXiv Detail & Related papers (2023-02-20T21:26:25Z) - Paraformer: Fast and Accurate Parallel Transformer for
Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer.
It accurately predicts the number of output tokens and extract hidden variables.
It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z) - Directed Acyclic Transformer for Non-Autoregressive Machine Translation [93.31114105366461]
Directed Acyclic Transfomer (DA-Transformer) represents hidden states in a Directed Acyclic Graph (DAG)
DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average.
arXiv Detail & Related papers (2022-05-16T06:02:29Z) - Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input [54.82369261350497]
We propose a CTC-enhanced NAR transformer, which generates target sequence by refining predictions of the CTC module.
Experimental results show that our method outperforms all previous NAR counterparts and achieves 50x faster decoding speed than a strong AR baseline with only 0.0 0.3 absolute CER degradation on Aishell-1 and Aishell-2 datasets.
arXiv Detail & Related papers (2020-10-28T15:00:09Z) - Fast Interleaved Bidirectional Sequence Generation [90.58793284654692]
We introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously.
We show that we can easily convert a standard architecture for unidirectional decoding into a bidirectional decoder.
Our interleaved bidirectional decoder (IBDecoder) retains the model simplicity and training efficiency of the standard Transformer.
arXiv Detail & Related papers (2020-10-27T17:38:51Z) - Transformer with Bidirectional Decoder for Speech Recognition [32.56014992915183]
We introduce a bidirectional speech transformer to utilize the different directional contexts simultaneously.
Specifically, the outputs of our proposed transformer include a left-to-right target, and a right-to-left target.
In inference stage, we use the introduced bidirectional beam search method, which can generate left-to-right candidates and also generate right-to-left candidates.
arXiv Detail & Related papers (2020-08-11T02:12:42Z) - On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder.
The gates are regularized using the expected value of the sparsity-inducing L0penalty.
We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.