Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network
- URL: http://arxiv.org/abs/2104.11127v1
- Date: Thu, 22 Apr 2021 15:21:41 GMT
- Title: Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network
- Authors: Janne Pylkk\"onen (1), Antti Ukkonen (1 and 2), Juho Kilpikoski (1),
Samu Tamminen (1), Hannes Heikinheimo (1) ((1) Speechly, (2) Department of
Computer Science, University of Helsinki, Finland)
- Abstract summary: We show that RNN-transducer models can be effectively adapted to new domains using only small amounts of textual data.
We show with multiple ASR evaluation tasks how this method can provide relative gains of 10-45% in target task WER.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Adaption of end-to-end speech recognition systems to new tasks is known to be
challenging. A number of solutions have been proposed which apply external
language models with various fusion methods, possibly with a combination of
two-pass decoding. Also TTS systems have been used to generate adaptation data
for the end-to-end models. In this paper we show that RNN-transducer models can
be effectively adapted to new domains using only small amounts of textual data.
By taking advantage of model's inherent structure, where the prediction network
is interpreted as a language model, we can apply fast adaptation to the model.
Adapting the model avoids the need for complicated decoding time fusions and
external language models. Using appropriate regularization, the prediction
network can be adapted to new domains while still retaining good generalization
capabilities. We show with multiple ASR evaluation tasks how this method can
provide relative gains of 10-45% in target task WER. We also share insights how
RNN-transducer prediction network performs as a language model.
Related papers
- Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Pre-Training a Graph Recurrent Network for Language Representation [34.4554387894105]
We consider a graph recurrent network for language model pre-training, which builds a graph structure for each sequence with local token-level communications.
We find that our model can generate more diverse outputs with less contextualized feature redundancy than existing attention-based models.
arXiv Detail & Related papers (2022-09-08T14:12:15Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - A Likelihood Ratio based Domain Adaptation Method for E2E Models [10.510472957585646]
End-to-end (E2E) automatic speech recognition models like Recurrent Neural Networks Transducer (RNN-T) are becoming a popular choice for streaming ASR applications like voice assistants.
While E2E models are very effective at learning representation of the training data they are trained on, their accuracy on unseen domains remains a challenging problem.
In this work, we explore a contextual biasing approach using likelihood-ratio that leverages text data sources to adapt RNN-T model to new domains and entities.
arXiv Detail & Related papers (2022-01-10T21:22:39Z) - Distributionally Robust Recurrent Decoders with Random Network
Distillation [93.10261573696788]
We propose a method based on OOD detection with Random Network Distillation to allow an autoregressive language model to disregard OOD context during inference.
We apply our method to a GRU architecture, demonstrating improvements on multiple language modeling (LM) datasets.
arXiv Detail & Related papers (2021-10-25T19:26:29Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Transfer Learning Approaches for Streaming End-to-End Speech Recognition
System [27.42059693923457]
Transfer learning (TL) is widely used in conventional hybrid automatic speech recognition (ASR) system.
This paper presents a comparative study of four different TL methods for RNN-T framework.
arXiv Detail & Related papers (2020-08-12T03:25:05Z) - Developing RNN-T Models Surpassing High-Performance Hybrid Models with
Customization Capability [46.73349163361723]
Recurrent neural network transducer (RNN-T) is a promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition.
We describe our recent development of RNN-T models with reduced GPU memory consumption during training.
We study how to customize RNN-T models to a new domain, which is important for deploying E2E models to practical scenarios.
arXiv Detail & Related papers (2020-07-30T02:35:20Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.