Related papers: Reducing Exposure Bias in Training Recurrent Neural Network Transducers

Reducing Exposure Bias in Training Recurrent Neural Network Transducers

URL: http://arxiv.org/abs/2108.10803v1
Date: Tue, 24 Aug 2021 15:43:42 GMT
Title: Reducing Exposure Bias in Training Recurrent Neural Network Transducers
Authors: Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltan Tuske
Abstract summary: We investigate approaches to reducing exposure bias in training to improve the generalization of RNNT models for automatic speech recognition. We show that we can further improve the accuracy of a high-performance RNNT ASR model and obtain state-of-the-art results on the 300-hour Switchboard dataset.
Score: 37.53697357406185
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When recurrent neural network transducers (RNNTs) are trained using the typical maximum likelihood criterion, the prediction network is trained only on ground truth label sequences. This leads to a mismatch during inference, known as exposure bias, when the model must deal with label sequences containing errors. In this paper we investigate approaches to reducing exposure bias in training to improve the generalization of RNNT models for automatic speech recognition (ASR). A label-preserving input perturbation to the prediction network is introduced. The input token sequences are perturbed using SwitchOut and scheduled sampling based on an additional token language model. Experiments conducted on the 300-hour Switchboard dataset demonstrate their effectiveness. By reducing the exposure bias, we show that we can further improve the accuracy of a high-performance RNNT ASR model and obtain state-of-the-art results on the 300-hour Switchboard dataset.

Related papers

Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff [2.4578723416255754]
We present a benchmark study on four insurance data sets with frequency and severity targets in the presence of multiple types of input features. We compare in detail the performance of a generalized linear model on binned input data, a gradient-boosted tree model, a feed-forward neural network (FFNN), and the combined actuarial neural network (CANN)
arXiv Detail & Related papers (2023-10-19T12:00:33Z)
Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z)
Input Perturbation Reduces Exposure Bias in Diffusion Models [41.483581603727444]
We show that a long sampling chain leads to an error accumulation phenomenon, similar to the exposure bias problem in autoregressive text generation. We propose a very simple but effective training regularization, consisting in perturbing the ground truth samples to simulate the inference time prediction errors. We empirically show that, without affecting the recall and precision, the proposed input perturbation leads to a significant improvement in the sample quality.
arXiv Detail & Related papers (2023-01-27T13:34:54Z)
NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction. The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network. A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z)
Neural Clamping: Joint Input Perturbation and Temperature Scaling for Neural Network Calibration [62.4971588282174]
We propose a new post-processing calibration method called Neural Clamping. Our empirical results show that Neural Clamping significantly outperforms state-of-the-art post-processing calibration methods.
arXiv Detail & Related papers (2022-09-23T14:18:39Z)
Improving self-supervised pretraining models for epileptic seizure detection from EEG data [0.23624125155742057]
This paper presents various self-supervision strategies to enhance the performance of a time-series based Diffusion convolution neural network (DCRNN) model. The learned weights in the self-supervision pretraining phase can be transferred to the supervised training phase to boost the model's prediction capability.
arXiv Detail & Related papers (2022-06-28T17:15:49Z)
Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing [49.82147684491619]
We introduce two techniques to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR) Length perturbation is a data augmentation algorithm that randomly drops and inserts frames of an utterance to alter the length of the speech feature sequence. N-best based label smoothing randomly injects noise to ground truth labels during training in order to avoid overfitting, where the noisy labels are generated from n-best hypotheses.
arXiv Detail & Related papers (2022-03-29T01:40:22Z)
ZORB: A Derivative-Free Backpropagation Algorithm for Neural Networks [3.6562366216810447]
We present a simple yet faster training algorithm called Zeroth-Order Relaxed Backpropagation (ZORB) Instead of calculating gradients, ZORB uses the pseudoinverse of targets to backpropagate information. Experiments on standard classification and regression benchmarks demonstrate ZORB's advantage over traditional backpropagation with Gradient Descent.
arXiv Detail & Related papers (2020-11-17T19:29:47Z)
Learning from Failure: Training Debiased Classifier from Biased Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge. We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously. Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.