Efficient minimum word error rate training of RNN-Transducer for
end-to-end speech recognition
- URL: http://arxiv.org/abs/2007.13802v1
- Date: Mon, 27 Jul 2020 18:33:35 GMT
- Title: Efficient minimum word error rate training of RNN-Transducer for
end-to-end speech recognition
- Authors: Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei
Huang, Andreas Stolcke, Roland Maas
- Abstract summary: We propose a novel and efficient minimum word error rate (MWER) training method for RNN-Transducer (RNN-T)
In our proposed method, we re-calculate and sum scores of all the possible alignments for each hypothesis in N-best lists.
The hypothesis probability scores and back-propagated gradients are calculated efficiently using the forward-backward algorithm.
- Score: 21.65651608697333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose a novel and efficient minimum word error rate (MWER)
training method for RNN-Transducer (RNN-T). Unlike previous work on this topic,
which performs on-the-fly limited-size beam-search decoding and generates
alignment scores for expected edit-distance computation, in our proposed
method, we re-calculate and sum scores of all the possible alignments for each
hypothesis in N-best lists. The hypothesis probability scores and
back-propagated gradients are calculated efficiently using the forward-backward
algorithm. Moreover, the proposed method allows us to decouple the decoding and
training processes, and thus we can perform offline parallel-decoding and MWER
training for each subset iteratively. Experimental results show that this
proposed semi-on-the-fly method can speed up the on-the-fly method by 6 times
and result in a similar WER improvement (3.6%) over a baseline RNN-T model. The
proposed MWER training can also effectively reduce high-deletion errors (9.2%
WER-reduction) introduced by RNN-T models when EOS is added for endpointer.
Further improvement can be achieved if we use a proposed RNN-T rescoring method
to re-rank hypotheses and use external RNN-LM to perform additional rescoring.
The best system achieves a 5% relative improvement on an English test-set of
real far-field recordings and a 11.6% WER reduction on music-domain utterances.
Related papers
- Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Grad-Instructor: Universal Backpropagation with Explainable Evaluation Neural Networks for Meta-learning and AutoML [0.0]
An Evaluation Neural Network (ENN) is trained via deep reinforcement learning to predict the performance of the target network.
The ENN then works as an additional evaluation function during backpropagation.
arXiv Detail & Related papers (2024-06-15T08:37:51Z) - Return of the RNN: Residual Recurrent Networks for Invertible Sentence
Embeddings [0.0]
This study presents a novel model for invertible sentence embeddings using a residual recurrent network trained on an unsupervised encoding task.
Rather than the probabilistic outputs common to neural machine translation models, our approach employs a regression-based output layer to reconstruct the input sequence's word vectors.
The model achieves high accuracy and fast training with the ADAM, a significant finding given that RNNs typically require memory units, such as LSTMs, or second-order optimization methods.
arXiv Detail & Related papers (2023-03-23T15:59:06Z) - Large-scale Optimization of Partial AUC in a Range of False Positive
Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning.
We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique.
Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z) - Sequence Transduction with Graph-based Supervision [96.04967815520193]
We present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels.
We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T.
arXiv Detail & Related papers (2021-11-01T21:51:42Z) - Boost Neural Networks by Checkpoints [9.411567653599358]
We propose a novel method to ensemble the checkpoints of deep neural networks (DNNs)
With the same training budget, our method achieves 4.16% lower error on Cifar-100 and 6.96% on Tiny-ImageNet with ResNet-110 architecture.
arXiv Detail & Related papers (2021-10-03T09:14:15Z) - Adaptive Nearest Neighbor Machine Translation [60.97183408140499]
kNN-MT combines pre-trained neural machine translation with token-level k-nearest-neighbor retrieval.
Traditional kNN algorithm simply retrieves a same number of nearest neighbors for each target token.
We propose Adaptive kNN-MT to dynamically determine the number of k for each target token.
arXiv Detail & Related papers (2021-05-27T09:27:42Z) - On Addressing Practical Challenges for RNN-Transduce [72.72132048437751]
We adapt a well-trained RNN-T model to a new domain without collecting the audio data.
We obtain word-level confidence scores by utilizing several types of features calculated during decoding.
The proposed time stamping method can get less than 50ms word timing difference on average.
arXiv Detail & Related papers (2021-04-27T23:31:43Z) - RNN Training along Locally Optimal Trajectories via Frank-Wolfe
Algorithm [50.76576946099215]
We propose a novel and efficient training method for RNNs by iteratively seeking a local minima on the loss surface within a small region.
We develop a novel RNN training method that, surprisingly, even with the additional cost, the overall training cost is empirically observed to be lower than back-propagation.
arXiv Detail & Related papers (2020-10-12T01:59:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.