Related papers: Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion

Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion

URL: http://arxiv.org/abs/2012.00133v1
Date: Mon, 30 Nov 2020 22:06:02 GMT
Title: Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion
Authors: Vijay Ravi, Yile Gu, Ankur Gandhe, Ariya Rastrow, Linda Liu, Denis Filimonov, Scott Novotney, Ivan Bulyko
Abstract summary: We propose unigram shallow fusion (USF) to improve rare words for RNN-T. We show that this simple method can improve performance on rare words by 3.7% WER relative without degradation on general test set.
Score: 9.071295269523068
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: End-to-end automatic speech recognition (ASR) systems, such as recurrent neural network transducer (RNN-T), have become popular, but rare word remains a challenge. In this paper, we propose a simple, yet effective method called unigram shallow fusion (USF) to improve rare words for RNN-T. In USF, we extract rare words from RNN-T training data based on unigram count, and apply a fixed reward when the word is encountered during decoding. We show that this simple method can improve performance on rare words by 3.7% WER relative without degradation on general test set, and the improvement from USF is additive to any additional language model based rescoring. Then, we show that the same USF does not work on conventional hybrid system. Finally, we reason that USF works by fixing errors in probability estimates of words due to Viterbi search used during decoding with subword-based RNN-T.

Related papers

SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings [76.87664008338317]
Contextual spelling correction models are an alternative to shallow fusion to improve automatic speech recognition. We propose a novel algorithm for candidate retrieval based on misspelled n-gram mappings. Experiments on Spoken Wikipedia show 21.4% word error rate improvement compared to a baseline ASR system.
arXiv Detail & Related papers (2023-06-04T10:00:12Z)
Return of the RNN: Residual Recurrent Networks for Invertible Sentence Embeddings [0.0]
This study presents a novel model for invertible sentence embeddings using a residual recurrent network trained on an unsupervised encoding task. Rather than the probabilistic outputs common to neural machine translation models, our approach employs a regression-based output layer to reconstruct the input sequence's word vectors. The model achieves high accuracy and fast training with the ADAM, a significant finding given that RNNs typically require memory units, such as LSTMs, or second-order optimization methods.
arXiv Detail & Related papers (2023-03-23T15:59:06Z)
Surrogate Gradient Spiking Neural Networks as Encoders for Large Vocabulary Continuous Speech Recognition [91.39701446828144]
We show that spiking neural networks can be trained like standard recurrent neural networks using the surrogate gradient method. They have shown promising results on speech command recognition tasks. In contrast to their recurrent non-spiking counterparts, they show robustness to exploding gradient problems without the need to use gates.
arXiv Detail & Related papers (2022-12-01T12:36:26Z)
ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation [51.06378042344563]
A new training oaxe loss has proven effective to ameliorate the effect of multimodality for non-autoregressive translation (NAT) We extend oaxe by only allowing reordering between ngram phrases and still requiring a strict match of word order within the phrases. Further analyses show that ngram-oaxe indeed improves the translation of ngram phrases, and produces more fluent translation with a better modeling of sentence structure.
arXiv Detail & Related papers (2022-10-08T11:39:15Z)
Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model [0.0]
We release contextual biasing lists to accompany the Earnings21 dataset. We show results for shallow fusion contextual biasing applied to two different decoding algorithms. We propose an alternate spelling prediction model that improves recall of rare words by 34.7% relative.
arXiv Detail & Related papers (2022-09-02T19:30:16Z)
NFLAT: Non-Flat-Lattice Transformer for Chinese Named Entity Recognition [39.308634515653914]
We advocate a novel lexical enhancement method, InterFormer, that effectively reduces the amount of computational and memory costs. Compared with FLAT, it reduces unnecessary attention calculations in "word-character" and "word-word" This reduces the memory usage by about 50% and can use more extensive lexicons or higher batches for network training.
arXiv Detail & Related papers (2022-05-12T01:55:37Z)
Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation [98.11249019844281]
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models. We propose reverse KD to rejuvenate more alignments for low-frequency target words. Results demonstrate that the proposed approach can significantly and universally improve translation quality.
arXiv Detail & Related papers (2021-06-02T02:41:40Z)
On Addressing Practical Challenges for RNN-Transduce [72.72132048437751]
We adapt a well-trained RNN-T model to a new domain without collecting the audio data. We obtain word-level confidence scores by utilizing several types of features calculated during decoding. The proposed time stamping method can get less than 50ms word timing difference on average.
arXiv Detail & Related papers (2021-04-27T23:31:43Z)
Deep Shallow Fusion for RNN-T Personalization [22.271012062526463]
We present novel techniques to improve RNN-T's ability to model rare WordPieces. We show that these combined techniques result in 15.4%-34.5% relative Word Error Rate improvement.
arXiv Detail & Related papers (2020-11-16T07:13:58Z)
Taking Notes on the Fly Helps BERT Pre-training [94.43953312613577]
Taking Notes on the Fly (TNF) takes notes for rare words on the fly during pre-training to help the model understand them when they occur next time. TNF provides better data utilization since cross-sentence information is employed to cover the inadequate semantics caused by rare words in the sentences.
arXiv Detail & Related papers (2020-08-04T11:25:09Z)
Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search [17.492336084190658]
In spoken Keyword Search, the query may contain out-of-vocabulary (OOV) words not observed when training the speech recognition system. Using subword language models (LMs) in the first-pass recognition makes it possible to recognize the OOV words, but even the subword n-gram LMs suffer from data sparsity. In this paper, we propose to interpolate the conventional n-gram models and the RNNLM approximation for better OOV recognition.
arXiv Detail & Related papers (2020-05-28T07:59:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.