Exploring Pre-training with Alignments for RNN Transducer based
End-to-End Speech Recognition
- URL: http://arxiv.org/abs/2005.00572v1
- Date: Fri, 1 May 2020 19:00:57 GMT
- Title: Exploring Pre-training with Alignments for RNN Transducer based
End-to-End Speech Recognition
- Authors: Hu Hu, Rui Zhao, Jinyu Li, Liang Lu, Yifan Gong
- Abstract summary: recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research.
In this work, we leverage external alignments to seed the RNN-T model.
Two different pre-training solutions are explored, referred to as encoder pre-training, and whole-network pre-training respectively.
- Score: 39.497407288772386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the recurrent neural network transducer (RNN-T) architecture has
become an emerging trend in end-to-end automatic speech recognition research
due to its advantages of being capable for online streaming speech recognition.
However, RNN-T training is made difficult by the huge memory requirements, and
complicated neural structure. A common solution to ease the RNN-T training is
to employ connectionist temporal classification (CTC) model along with RNN
language model (RNNLM) to initialize the RNN-T parameters. In this work, we
conversely leverage external alignments to seed the RNN-T model. Two different
pre-training solutions are explored, referred to as encoder pre-training, and
whole-network pre-training respectively. Evaluated on Microsoft 65,000 hours
anonymized production data with personally identifiable information removed,
our proposed methods can obtain significant improvement. In particular, the
encoder pre-training solution achieved a 10% and a 8% relative word error rate
reduction when compared with random initialization and the widely used
CTC+RNNLM initialization strategy, respectively. Our solutions also
significantly reduce the RNN-T model latency from the baseline.
Related papers
- Accurate Mapping of RNNs on Neuromorphic Hardware with Adaptive Spiking Neurons [2.9410174624086025]
We present a $SigmaDelta$-low-pass RNN (lpRNN) for mapping rate-based RNNs to spiking neural networks (SNNs)
An adaptive spiking neuron model encodes signals using $SigmaDelta$-modulation and enables precise mapping.
We demonstrate the implementation of the lpRNN on Intel's neuromorphic research chip Loihi.
arXiv Detail & Related papers (2024-07-18T14:06:07Z) - Return of the RNN: Residual Recurrent Networks for Invertible Sentence
Embeddings [0.0]
This study presents a novel model for invertible sentence embeddings using a residual recurrent network trained on an unsupervised encoding task.
Rather than the probabilistic outputs common to neural machine translation models, our approach employs a regression-based output layer to reconstruct the input sequence's word vectors.
The model achieves high accuracy and fast training with the ADAM, a significant finding given that RNNs typically require memory units, such as LSTMs, or second-order optimization methods.
arXiv Detail & Related papers (2023-03-23T15:59:06Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - Spike-inspired Rank Coding for Fast and Accurate Recurrent Neural
Networks [5.986408771459261]
Biological spiking neural networks (SNNs) can temporally encode information in their outputs, whereas artificial neural networks (ANNs) conventionally do not.
Here we show that temporal coding such as rank coding (RC) inspired by SNNs can also be applied to conventional ANNs such as LSTMs.
RC-training also significantly reduces time-to-insight during inference, with a minimal decrease in accuracy.
We demonstrate these in two toy problems of sequence classification, and in a temporally-encoded MNIST dataset where our RC model achieves 99.19% accuracy after the first input time-step
arXiv Detail & Related papers (2021-10-06T15:51:38Z) - On Addressing Practical Challenges for RNN-Transduce [72.72132048437751]
We adapt a well-trained RNN-T model to a new domain without collecting the audio data.
We obtain word-level confidence scores by utilizing several types of features calculated during decoding.
The proposed time stamping method can get less than 50ms word timing difference on average.
arXiv Detail & Related papers (2021-04-27T23:31:43Z) - Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z) - Alignment Restricted Streaming Recurrent Neural Network Transducer [29.218353627837214]
We propose a modification to the RNN-T loss function and develop Alignment Restricted RNN-T models.
The Ar-RNN-T loss provides a refined control to navigate the trade-offs between the token emission delays and the Word Error Rate (WER)
The Ar-RNN-T models also improve downstream applications such as the ASR End-pointing by guaranteeing token emissions within any given range of latency.
arXiv Detail & Related papers (2020-11-05T19:38:54Z) - Skip-Connected Self-Recurrent Spiking Neural Networks with Joint
Intrinsic Parameter and Synaptic Weight Training [14.992756670960008]
We propose a new type of RSNN called Skip-Connected Self-Recurrent SNNs (ScSr-SNNs)
ScSr-SNNs can boost performance by up to 2.55% compared with other types of RSNNs trained by state-of-the-art BP methods.
arXiv Detail & Related papers (2020-10-23T22:27:13Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.