Fast Contextual Adaptation with Neural Associative Memory for On-Device
Personalized Speech Recognition
- URL: http://arxiv.org/abs/2110.02220v2
- Date: Thu, 7 Oct 2021 00:12:51 GMT
- Title: Fast Contextual Adaptation with Neural Associative Memory for On-Device
Personalized Speech Recognition
- Authors: Tsendsuren Munkhdalai, Khe Chai Sim, Angad Chandorkar, Fan Gao, Mason
Chua, Trevor Strohman, Fran\c{c}oise Beaufays
- Abstract summary: We introduce a model-based end-to-end contextual adaptation approach that is decoder-agnostic and amenable to on-device personalization.
Our on-device simulation experiments demonstrate that the proposed approach outperforms the traditional re-scoring technique by 12% relative WER.
- Score: 16.367495908535894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fast contextual adaptation has shown to be effective in improving Automatic
Speech Recognition (ASR) of rare words and when combined with an on-device
personalized training, it can yield an even better recognition result. However,
the traditional re-scoring approaches based on an external language model is
prone to diverge during the personalized training. In this work, we introduce a
model-based end-to-end contextual adaptation approach that is decoder-agnostic
and amenable to on-device personalization. Our on-device simulation experiments
demonstrate that the proposed approach outperforms the traditional re-scoring
technique by 12% relative WER and 15.7% entity mention specific F1-score in a
continues personalization scenario.
Related papers
- Personalized Adaptation via In-Context Preference Learning [20.042909385219716]
Preference Pretrained Transformer (PPT) is a novel approach for adaptive personalization using online user feedback.
Our results suggest the potential of in-context learning for scalable and efficient personalization in large language models.
arXiv Detail & Related papers (2024-10-17T20:06:02Z) - Continual Learning for On-Device Speech Recognition using Disentangled
Conformers [54.32320258055716]
We introduce a continual learning benchmark for speaker-specific domain adaptation derived from LibriVox audiobooks.
We propose a novel compute-efficient continual learning algorithm called DisentangledCL.
Our experiments show that the DisConformer models significantly outperform baselines on general ASR.
arXiv Detail & Related papers (2022-12-02T18:58:51Z) - LongFNT: Long-form Speech Recognition with Factorized Neural Transducer [64.75547712366784]
We propose the LongFNT-Text architecture, which fuses the sentence-level long-form features directly with the output of the vocabulary predictor.
The effectiveness of our LongFNT approach is validated on LibriSpeech and GigaSpeech corpora with 19% and 12% relative word error rate(WER) reduction, respectively.
arXiv Detail & Related papers (2022-11-17T08:48:27Z) - Contextual-Utterance Training for Automatic Speech Recognition [65.4571135368178]
We propose a contextual-utterance training technique which makes use of the previous and future contextual utterances.
Also, we propose a dual-mode contextual-utterance training technique for streaming automatic speech recognition (ASR) systems.
The proposed technique is able to reduce both the WER and the average last token emission latency by more than 6% and 40ms relative.
arXiv Detail & Related papers (2022-10-27T08:10:44Z) - Contextual Adapters for Personalized Speech Recognition in Neural
Transducers [16.628830937429388]
We propose training neural contextual adapters for personalization in neural transducer based ASR models.
Our approach can not only bias towards user-defined words, but also has the flexibility to work with pretrained ASR models.
arXiv Detail & Related papers (2022-05-26T22:46:28Z) - Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task.
This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z) - An Exploration of Self-Supervised Pretrained Representations for
End-to-End Speech Recognition [98.70304981174748]
We focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models.
We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.
arXiv Detail & Related papers (2021-10-09T15:06:09Z) - Private Language Model Adaptation for Speech Recognition [15.726921748859393]
Speech model adaptation is crucial to handle the discrepancy between server-side proxy training data and actual data received on users' local devices.
We introduce an efficient approach on continuously adapting neural network language models (NNLMs) on private devices with applications on automatic speech recognition.
arXiv Detail & Related papers (2021-09-28T00:15:43Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition [41.92991390542083]
We present a simple, novel and competitive approach for phoneme-based neural transducer modeling.
A phonetic context size of one is shown to be sufficient for the best performance.
The overall performance of our best model is comparable to state-of-the-art (SOTA) results for the TED-LIUM Release 2 and Switchboard corpora.
arXiv Detail & Related papers (2020-10-30T16:53:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.