Echo State Speech Recognition
- URL: http://arxiv.org/abs/2102.09114v1
- Date: Thu, 18 Feb 2021 02:04:14 GMT
- Title: Echo State Speech Recognition
- Authors: Harsh Shrivastava, Ankush Garg, Yuan Cao, Yu Zhang, Tara Sainath
- Abstract summary: We propose automatic speech recognition models inspired by echo state network (ESN)
We show that model quality does not drop even when the decoder is fully randomized.
Such models can be trained more efficiently as the decoders do not require to be updated.
- Score: 10.084532635965513
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose automatic speech recognition (ASR) models inspired by echo state
network (ESN), in which a subset of recurrent neural networks (RNN) layers in
the models are randomly initialized and untrained. Our study focuses on RNN-T
and Conformer models, and we show that model quality does not drop even when
the decoder is fully randomized. Furthermore, such models can be trained more
efficiently as the decoders do not require to be updated. By contrast,
randomizing encoders hurts model quality, indicating that optimizing encoders
and learn proper representations for acoustic inputs are more vital for speech
recognition. Overall, we challenge the common practice of training ASR models
for all components, and demonstrate that ESN-based models can perform equally
well but enable more efficient training and storage than fully-trainable
counterparts.
Related papers
- Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations [16.577870835480585]
We present a comprehensive analysis on building ASR systems with discrete codes.
We investigate different methods for training such as quantization schemes and time-domain vs spectral feature encodings.
We introduce a pipeline that outperforms Encodec at similar bit-rate.
arXiv Detail & Related papers (2024-07-03T20:51:41Z) - Neural Network Diffusion [45.851945143942885]
A diffusion model is trained to synthesize latent representations from random noise.
This model then generates new representations, which are passed through the autoencoder's decoder to produce new subsets of high-performing network parameters.
arXiv Detail & Related papers (2024-02-20T16:59:03Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Visualising and Explaining Deep Learning Models for Speech Quality
Prediction [0.0]
The non-intrusive speech quality prediction model NISQA is analyzed in this paper.
It is composed of a convolutional neural network (CNN) and a recurrent neural network (RNN)
arXiv Detail & Related papers (2021-12-12T12:50:03Z) - Distributionally Robust Recurrent Decoders with Random Network
Distillation [93.10261573696788]
We propose a method based on OOD detection with Random Network Distillation to allow an autoregressive language model to disregard OOD context during inference.
We apply our method to a GRU architecture, demonstrating improvements on multiple language modeling (LM) datasets.
arXiv Detail & Related papers (2021-10-25T19:26:29Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z) - Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for
Low-resource Speech Recognition [9.732767611907068]
In this work, we fuse a pre-trained acoustic encoder (wav2vec2.0) and a pre-trained linguistic encoder (BERT) into an end-to-end ASR model.
Our model achieves better recognition performance on CALLHOME corpus (15 hours) than other end-to-end models.
arXiv Detail & Related papers (2021-01-17T16:12:44Z) - Pretraining Techniques for Sequence-to-Sequence Voice Conversion [57.65753150356411]
Sequence-to-sequence (seq2seq) voice conversion (VC) models are attractive owing to their ability to convert prosody.
We propose to transfer knowledge from other speech processing tasks where large-scale corpora are easily available, typically text-to-speech (TTS) and automatic speech recognition (ASR)
We argue that VC models with such pretrained ASR or TTS model parameters can generate effective hidden representations for high-fidelity, highly intelligible converted speech.
arXiv Detail & Related papers (2020-08-07T11:02:07Z) - AutoSpeech: Neural Architecture Search for Speaker Recognition [108.69505815793028]
We propose the first neural architecture search approach approach for the speaker recognition tasks, named as AutoSpeech.
Our algorithm first identifies the optimal operation combination in a neural cell and then derives a CNN model by stacking the neural cell for multiple times.
Results demonstrate that the derived CNN architectures significantly outperform current speaker recognition systems based on VGG-M, ResNet-18, and ResNet-34 back-bones, while enjoying lower model complexity.
arXiv Detail & Related papers (2020-05-07T02:53:47Z) - Streaming automatic speech recognition with the transformer model [59.58318952000571]
We propose a transformer based end-to-end ASR system for streaming ASR.
We apply time-restricted self-attention for the encoder and triggered attention for the encoder-decoder attention mechanism.
Our proposed streaming transformer architecture achieves 2.8% and 7.2% WER for the "clean" and "other" test data of LibriSpeech.
arXiv Detail & Related papers (2020-01-08T18:58:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.