High-Accuracy and Low-Latency Speech Recognition with Two-Head
Contextual Layer Trajectory LSTM Model
- URL: http://arxiv.org/abs/2003.07482v1
- Date: Tue, 17 Mar 2020 00:52:11 GMT
- Title: High-Accuracy and Low-Latency Speech Recognition with Two-Head
Contextual Layer Trajectory LSTM Model
- Authors: Jinyu Li, Rui Zhao, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng,
and Yifan Gong
- Abstract summary: We improve conventional hybrid LSTM acoustic models for high-accuracy and low-latency automatic speech recognition.
To achieve high accuracy, we use a contextual layer trajectory LSTM (cltLSTM), which decouples the temporal modeling and target classification tasks.
We further improve the training strategy with sequence-level teacher-student learning.
- Score: 46.34788932277904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the community keeps promoting end-to-end models over conventional
hybrid models, which usually are long short-term memory (LSTM) models trained
with a cross entropy criterion followed by a sequence discriminative training
criterion, we argue that such conventional hybrid models can still be
significantly improved. In this paper, we detail our recent efforts to improve
conventional hybrid LSTM acoustic models for high-accuracy and low-latency
automatic speech recognition. To achieve high accuracy, we use a contextual
layer trajectory LSTM (cltLSTM), which decouples the temporal modeling and
target classification tasks, and incorporates future context frames to get more
information for accurate acoustic modeling. We further improve the training
strategy with sequence-level teacher-student learning. To obtain low latency,
we design a two-head cltLSTM, in which one head has zero latency and the other
head has a small latency, compared to an LSTM. When trained with Microsoft's 65
thousand hours of anonymized training data and evaluated with test sets with
1.8 million words, the proposed two-head cltLSTM model with the proposed
training strategy yields a 28.2\% relative WER reduction over the conventional
LSTM acoustic model, with a similar perceived latency.
Related papers
- MAST: Multiscale Audio Spectrogram Transformers [53.06337011259031]
We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification, which brings the concept of multiscale feature hierarchies to the Audio Spectrogram Transformer (AST)
In practice, MAST significantly outperforms AST by an average accuracy of 3.4% across 8 speech and non-speech tasks from the LAPE Benchmark.
arXiv Detail & Related papers (2022-11-02T23:34:12Z) - Extreme-Long-short Term Memory for Time-series Prediction [0.0]
Long Short-Term Memory (LSTM) is a new type of Recurrent Neural Networks (RNN)
In this paper, we propose an advanced LSTM algorithm, the Extreme Long Short-Term Memory (E-LSTM)
The new E-LSTM requires only 2 epochs to obtain the results of the 7th epoch traditional LSTM.
arXiv Detail & Related papers (2022-10-15T09:45:48Z) - A Mixture of Expert Based Deep Neural Network for Improved ASR [4.993304210475779]
MixNet is a novel deep learning architecture for acoustic model in the context of Automatic Speech Recognition (ASR)
In natural speech, overlap in distribution across different acoustic classes is inevitable, which leads to inter-class mis-classification.
Experiments are conducted on a large vocabulary ASR task which show that the proposed architecture provides 13.6% and 10.0% relative reduction in word error rates.
arXiv Detail & Related papers (2021-12-02T07:26:34Z) - Conformer-based Hybrid ASR System for Switchboard Dataset [99.88988282353206]
We present and evaluate a competitive conformer-based hybrid model training recipe.
We study different training aspects and methods to improve word-error-rate as well as to increase training speed.
We conduct experiments on Switchboard 300h dataset and our conformer-based hybrid model achieves competitive results.
arXiv Detail & Related papers (2021-11-05T12:03:18Z) - Score-based Generative Modeling in Latent Space [93.8985523558869]
Score-based generative models (SGMs) have recently demonstrated impressive results in terms of both sample quality and distribution coverage.
Here, we propose the Latent Score-based Generative Model (LSGM), a novel approach that trains SGMs in a latent space.
Moving from data to latent space allows us to train more expressive generative models, apply SGMs to non-continuous data, and learn smoother SGMs in a smaller space.
arXiv Detail & Related papers (2021-06-10T17:26:35Z) - Quantum Long Short-Term Memory [3.675884635364471]
Long short-term memory (LSTM) is a recurrent neural network (RNN) for sequence and temporal dependency data modeling.
We propose a hybrid quantum-classical model of LSTM, which we dub QLSTM.
Our work paves the way toward implementing machine learning algorithms for sequence modeling on noisy intermediate-scale quantum (NISQ) devices.
arXiv Detail & Related papers (2020-09-03T16:41:09Z) - Sentiment Analysis Using Simplified Long Short-term Memory Recurrent
Neural Networks [1.5146765382501612]
We perform sentiment analysis on a GOP Debate Twitter dataset.
To speed up training and reduce the computational cost and time, six different parameter reduced slim versions of the LSTM model are proposed.
arXiv Detail & Related papers (2020-05-08T12:50:10Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z) - A Hybrid Residual Dilated LSTM end Exponential Smoothing Model for
Mid-Term Electric Load Forecasting [1.1602089225841632]
The model combines exponential smoothing (ETS), advanced Long Short-Term Memory (LSTM) and ensembling.
A simulation study performed on the monthly electricity demand time series for 35 European countries confirmed the high performance of the proposed model.
arXiv Detail & Related papers (2020-03-29T10:53:50Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.