Improved Neural Language Model Fusion for Streaming Recurrent Neural
Network Transducer
- URL: http://arxiv.org/abs/2010.13878v1
- Date: Mon, 26 Oct 2020 20:10:12 GMT
- Title: Improved Neural Language Model Fusion for Streaming Recurrent Neural
Network Transducer
- Authors: Suyoun Kim, Yuan Shangguan, Jay Mahadeokar, Antoine Bruguier,
Christian Fuegen, Michael L. Seltzer, Duc Le
- Abstract summary: Recurrent Neural Network Transducer (RNN-T) has an implicit neural network language model (NNLM) and cannot easily leverage unpaired text data during training.
Previous work has proposed various fusion methods to incorporate external NNLMs into end-to-end ASR to address this weakness.
We propose extensions to these techniques that allow RNN-T to exploit external NNLMs during both training and inference time.
- Score: 28.697119605752643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent Neural Network Transducer (RNN-T), like most end-to-end speech
recognition model architectures, has an implicit neural network language model
(NNLM) and cannot easily leverage unpaired text data during training. Previous
work has proposed various fusion methods to incorporate external NNLMs into
end-to-end ASR to address this weakness. In this paper, we propose extensions
to these techniques that allow RNN-T to exploit external NNLMs during both
training and inference time, resulting in 13-18% relative Word Error Rate
improvement on Librispeech compared to strong baselines. Furthermore, our
methods do not incur extra algorithmic latency and allow for flexible
plug-and-play of different NNLMs without re-training. We also share in-depth
analysis to better understand the benefits of the different NNLM fusion
methods. Our work provides a reliable technique for leveraging unpaired text
data to significantly improve RNN-T while keeping the system streamable,
flexible, and lightweight.
Related papers
- The Robustness of Spiking Neural Networks in Communication and its Application towards Network Efficiency in Federated Learning [6.9569682335746235]
Spiking Neural Networks (SNNs) have recently gained significant interest in on-chip learning in embedded devices.
In this paper, we explore the inherent robustness of SNNs under noisy communication in Federated Learning.
We propose a novel Federated Learning with TopK Sparsification algorithm to reduce the bandwidth usage for FL training.
arXiv Detail & Related papers (2024-09-19T13:37:18Z) - Return of the RNN: Residual Recurrent Networks for Invertible Sentence
Embeddings [0.0]
This study presents a novel model for invertible sentence embeddings using a residual recurrent network trained on an unsupervised encoding task.
Rather than the probabilistic outputs common to neural machine translation models, our approach employs a regression-based output layer to reconstruct the input sequence's word vectors.
The model achieves high accuracy and fast training with the ADAM, a significant finding given that RNNs typically require memory units, such as LSTMs, or second-order optimization methods.
arXiv Detail & Related papers (2023-03-23T15:59:06Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - Reinforcement Learning with External Knowledge by using Logical Neural
Networks [67.46162586940905]
A recent neuro-symbolic framework called the Logical Neural Networks (LNNs) can simultaneously provide key-properties of both neural networks and symbolic logic.
We propose an integrated method that enables model-free reinforcement learning from external knowledge sources.
arXiv Detail & Related papers (2021-03-03T12:34:59Z) - Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z) - Operational vs Convolutional Neural Networks for Image Denoising [25.838282412957675]
Convolutional Neural Networks (CNNs) have recently become a favored technique for image denoising due to its adaptive learning ability.
We propose a heterogeneous network model which allows greater flexibility for embedding additional non-linearity at the core of the data transformation.
An extensive set of comparative evaluations of ONNs and CNNs over two severe image denoising problems yield conclusive evidence that ONNs enriched by non-linear operators can achieve a superior denoising performance against CNNs with both equivalent and well-known deep configurations.
arXiv Detail & Related papers (2020-09-01T12:15:28Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - On the Effectiveness of Neural Text Generation based Data Augmentation
for Recognition of Morphologically Rich Speech [0.0]
We have significantly improved the online performance of a conversational speech transcription system by transferring knowledge from a RNNLM to the single pass BNLM with text generation based data augmentation.
We show that using the RNN-BNLM in the first pass followed by a neural second pass, offline ASR results can be even significantly improved.
arXiv Detail & Related papers (2020-06-09T09:01:04Z) - Exploring Pre-training with Alignments for RNN Transducer based
End-to-End Speech Recognition [39.497407288772386]
recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research.
In this work, we leverage external alignments to seed the RNN-T model.
Two different pre-training solutions are explored, referred to as encoder pre-training, and whole-network pre-training respectively.
arXiv Detail & Related papers (2020-05-01T19:00:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.