Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech
Recognition
- URL: http://arxiv.org/abs/2005.07903v1
- Date: Sat, 16 May 2020 08:27:20 GMT
- Title: Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech
Recognition
- Authors: Zhengkun Tian and Jiangyan Yi and Jianhua Tao and Ye Bai and Shuai
Zhang and Zhengqi Wen
- Abstract summary: We propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition.
The proposed model can accurately predict the length of the target sequence and achieve a competitive performance.
The model even achieves a real-time factor of 0.0056, which exceeds all mainstream speech recognition models.
- Score: 66.47000813920617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-autoregressive transformer models have achieved extremely fast inference
speed and comparable performance with autoregressive sequence-to-sequence
models in neural machine translation. Most of the non-autoregressive
transformers decode the target sequence from a predefined-length mask sequence.
If the predefined length is too long, it will cause a lot of redundant
calculations. If the predefined length is shorter than the length of the target
sequence, it will hurt the performance of the model. To address this problem
and improve the inference speed, we propose a spike-triggered
non-autoregressive transformer model for end-to-end speech recognition, which
introduces a CTC module to predict the length of the target sequence and
accelerate the convergence. All the experiments are conducted on a public
Chinese mandarin dataset AISHELL-1. The results show that the proposed model
can accurately predict the length of the target sequence and achieve a
competitive performance with the advanced transformers. What's more, the model
even achieves a real-time factor of 0.0056, which exceeds all mainstream speech
recognition models.
Related papers
- CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z) - Toeplitz Neural Network for Sequence Modeling [46.04964190407727]
We show that a Toeplitz matrix-vector production trick can reduce the space-time complexity of the sequence modeling to log linear.
A lightweight sub-network called relative position encoder is proposed to generate relative position coefficients with a fixed budget of parameters.
Despite being trained on 512-token sequences, our model can extrapolate input sequence length up to 14K tokens in inference with consistent performance.
arXiv Detail & Related papers (2023-05-08T14:49:01Z) - Paraformer: Fast and Accurate Parallel Transformer for
Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer.
It accurately predicts the number of output tokens and extract hidden variables.
It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z) - Pose Transformers (POTR): Human Motion Prediction with
Non-Autoregressive Transformers [24.36592204215444]
We propose to leverage Transformer architectures for non-autoregressive human motion prediction.
Our approach decodes elements in parallel from a query sequence, instead of conditioning on previous predictions.
We show that despite its simplicity, our approach achieves competitive results in two public datasets.
arXiv Detail & Related papers (2021-09-15T18:55:15Z) - Sequence Length is a Domain: Length-based Overfitting in Transformer
Models [0.0]
In machine translation, the neural-based systems perform worse on very long sequences when compared to the preceding phrase-based translation approaches.
We show that the observed drop in performance is due to the hypothesis length corresponding to the lengths seen by the model during training rather than the length of the input sequence.
arXiv Detail & Related papers (2021-09-15T13:25:19Z) - Long-Short Transformer: Efficient Transformers for Language and Vision [97.2850205384295]
Long-Short Transformer (Transformer-LS) is an efficient self-attention mechanism for modeling long sequences with linear complexity for both language and vision tasks.
It aggregates a novel long-range attention with dynamic projection to model distant correlations and a short-term attention to capture fine-grained local correlations.
Our method outperforms the state-of-the-art models on multiple tasks in language and vision domains, including the Long Range Arena benchmark, autoregressive language modeling, and ImageNet classification.
arXiv Detail & Related papers (2021-07-05T18:00:14Z) - Informer: Beyond Efficient Transformer for Long Sequence Time-Series
Forecasting [25.417560221400347]
Long sequence time-series forecasting (LSTF) demands a high prediction capacity.
Recent studies have shown the potential of Transformer to increase the prediction capacity.
We design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics.
arXiv Detail & Related papers (2020-12-14T11:43:09Z) - Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input [54.82369261350497]
We propose a CTC-enhanced NAR transformer, which generates target sequence by refining predictions of the CTC module.
Experimental results show that our method outperforms all previous NAR counterparts and achieves 50x faster decoding speed than a strong AR baseline with only 0.0 0.3 absolute CER degradation on Aishell-1 and Aishell-2 datasets.
arXiv Detail & Related papers (2020-10-28T15:00:09Z) - Funnel-Transformer: Filtering out Sequential Redundancy for Efficient
Language Processing [112.2208052057002]
We propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one.
With comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on a wide variety of sequence-level prediction tasks.
arXiv Detail & Related papers (2020-06-05T05:16:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.