Related papers: Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging

Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging

URL: http://arxiv.org/abs/2012.06749v1
Date: Sat, 12 Dec 2020 07:39:21 GMT
Title: Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging
Authors: Rohit Prabhavalkar, Yanzhang He, David Rybach, Sean Campbell, Arun Narayanan, Trevor Strohman, Tara N. Sainath
Abstract summary: We study the influence of the amount of label context on the model's accuracy, and its impact on the efficiency of the decoding process. We find that we can limit the context of the recurrent neural network transducer (RNN-T) during training to just four previous word-piece labels, without degrading word error rate (WER) relative to the full-context baseline.
Score: 43.388004364072174
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: End-to-end models that condition the output label sequence on all previously predicted labels have emerged as popular alternatives to conventional systems for automatic speech recognition (ASR). Since unique label histories correspond to distinct models states, such models are decoded using an approximate beam-search process which produces a tree of hypotheses. In this work, we study the influence of the amount of label context on the model's accuracy, and its impact on the efficiency of the decoding process. We find that we can limit the context of the recurrent neural network transducer (RNN-T) during training to just four previous word-piece labels, without degrading word error rate (WER) relative to the full-context baseline. Limiting context also provides opportunities to improve the efficiency of the beam-search process during decoding by removing redundant paths from the active beam, and instead retaining them in the final lattice. This path-merging scheme can also be applied when decoding the baseline full-context model through an approximation. Overall, we find that the proposed path-merging scheme is extremely effective allowing us to improve oracle WERs by up to 36% over the baseline, while simultaneously reducing the number of model evaluations by up to 5.3% without any degradation in WER.

Related papers

An Energy-based Model for Word-level AutoCompletion in Computer-aided Translation [97.3797716862478]
Word-level AutoCompletion (WLAC) is a rewarding yet challenging task in Computer-aided Translation. Existing work addresses this task through a classification model based on a neural network that maps the hidden vector of the input context into its corresponding label. This work proposes an energy-based model for WLAC, which enables the context hidden vector to capture crucial information from the source sentence.
arXiv Detail & Related papers (2024-07-29T15:07:19Z)
Self-consistent context aware conformer transducer for speech recognition [0.06008132390640294]
We introduce a novel neural network module that adeptly handles recursive data flow in neural network architectures. Our method notably improves the accuracy of recognizing rare words without adversely affecting the word error rate for common vocabulary. Our findings reveal that the combination of both approaches can improve the accuracy of detecting rare words by as much as 4.5 times.
arXiv Detail & Related papers (2024-02-09T18:12:11Z)
Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text. As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z)
All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning. We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z)
Interpolation-based Contrastive Learning for Few-Label Semi-Supervised Learning [43.51182049644767]
Semi-supervised learning (SSL) has long been proved to be an effective technique to construct powerful models with limited labels. Regularization-based methods which force the perturbed samples to have similar predictions with the original ones have attracted much attention. We propose a novel contrastive loss to guide the embedding of the learned network to change linearly between samples.
arXiv Detail & Related papers (2022-02-24T06:00:05Z)
Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances. Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
Uncertainty-Aware Label Refinement for Sequence Labeling [47.67853514765981]
We introduce a novel two-stage label decoding framework to model long-term label dependencies. A base model first predicts draft labels, and then a novel two-stream self-attention model makes refinements on these draft predictions.
arXiv Detail & Related papers (2020-12-19T06:56:59Z)
Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition [21.65651608697333]
We propose a novel and efficient minimum word error rate (MWER) training method for RNN-Transducer (RNN-T) In our proposed method, we re-calculate and sum scores of all the possible alignments for each hypothesis in N-best lists. The hypothesis probability scores and back-propagated gradients are calculated efficiently using the forward-backward algorithm.
arXiv Detail & Related papers (2020-07-27T18:33:35Z)
Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning. We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.