Related papers: Improving Rare Word Recognition with LM-aware MWER Training

Improving Rare Word Recognition with LM-aware MWER Training

URL: http://arxiv.org/abs/2204.07553v1
Date: Fri, 15 Apr 2022 17:19:41 GMT
Title: Improving Rare Word Recognition with LM-aware MWER Training
Authors: Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach
Abstract summary: We introduce LMs in the learning of hybrid autoregressive transducer (HAT) models in the discriminative training framework. For the shallow fusion setup, we use LMs during both hypotheses generation and loss computation, and the LM-aware MWER-trained model achieves 10% relative improvement. For the rescoring setup, we learn a small neural module to generate per-token fusion weights in a data-dependent manner.
Score: 50.241159623691885
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language models (LMs) significantly improve the recognition accuracy of end-to-end (E2E) models on words rarely seen during training, when used in either the shallow fusion or the rescoring setups. In this work, we introduce LMs in the learning of hybrid autoregressive transducer (HAT) models in the discriminative training framework, to mitigate the training versus inference gap regarding the use of LMs. For the shallow fusion setup, we use LMs during both hypotheses generation and loss computation, and the LM-aware MWER-trained model achieves 10\% relative improvement over the model trained with standard MWER on voice search test sets containing rare words. For the rescoring setup, we learn a small neural module to generate per-token fusion weights in a data-dependent manner. This model achieves the same rescoring WER as regular MWER-trained model, but without the need for sweeping fusion weights.

Related papers

Scaling Diffusion Language Models via Adaptation from Autoregressive Models [105.70889434492143]
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling. We show that we can convert AR models ranging from 127M to 7B parameters into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training. Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts.
arXiv Detail & Related papers (2024-10-23T14:04:22Z)
Effective internal language model training and fusion for factorized transducer model [26.371223360905557]
Internal language model (ILM) of the neural transducer has been widely studied. We propose a novel ILM training and decoding strategy for factorized transducer models.
arXiv Detail & Related papers (2024-04-02T08:01:05Z)
An Emulator for Fine-Tuning Large Language Models using Small Language Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales. We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training. Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z)
Pre-training Language Model as a Multi-perspective Course Learner [103.17674402415582]
This study proposes a multi-perspective course learning (MCL) method for sample-efficient pre-training. In this study, three self-supervision courses are designed to alleviate inherent flaws of "tug-of-war" dynamics. Our method significantly improves ELECTRA's average performance by 2.8% and 3.2% absolute points respectively on GLUE and SQuAD 2.0 benchmarks.
arXiv Detail & Related papers (2023-05-06T09:02:10Z)
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition [82.60133751942854]
Internal language model estimation (ILME)-based LM fusion has shown significant word error rate (WER) reduction from Shallow Fusion. We propose a novel MWER training with ILME (MWER-ILME) where the ILME-based fusion is conducted to generate N-best hypotheses and their posteriors. MWER-ILME achieves on average 8.8% and 5.8% relative WER reductions from MWER and MWER-SF training, respectively, on 6 different test sets.
arXiv Detail & Related papers (2021-06-04T07:24:49Z)
Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models [61.768082640087]
We explore joint energy-based model (EBM) training during the finetuning of pretrained text encoders for natural language understanding tasks. Experiments show that EBM training can help the model reach a better calibration that is competitive to strong baselines.
arXiv Detail & Related papers (2021-01-18T01:41:31Z)
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer [40.63693071222628]
We study the minimum word error rate (MWER) training of Hybrid Autoregressive Transducer (HAT) From experiments with around 30,000 hours of training data, we show that MWER training can improve the accuracy of HAT models.
arXiv Detail & Related papers (2020-10-23T21:16:30Z)
Early Stage LM Integration Using Local and Global Log-Linear Combination [46.91755970827846]
Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM) One important factor to improve word error rate in both cases is the use of an external language model (LM) trained on large text-only corpora. We present a novel method for language model integration into implicit-alignment based sequence-to-sequence models.
arXiv Detail & Related papers (2020-05-20T13:49:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.