Continuously Learning New Words in Automatic Speech Recognition
- URL: http://arxiv.org/abs/2401.04482v4
- Date: Wed, 29 Jan 2025 14:55:28 GMT
- Title: Continuously Learning New Words in Automatic Speech Recognition
- Authors: Christian Huber, Alexander Waibel,
- Abstract summary: We propose a self-supervised continual learning approach for Automatic Speech Recognition.
We use a memory-enhanced ASR model from the literature to decode new words from the slides.
We show that with this approach, we obtain increasing performance on the new words when they occur more frequently.
- Score: 56.972851337263755
- License:
- Abstract: Despite recent advances, Automatic Speech Recognition (ASR) systems are still far from perfect. Typical errors include acronyms, named entities, and domain-specific special words for which little or no labeled data is available. To address the problem of recognizing these words, we propose a self-supervised continual learning approach: Given the audio of a lecture talk with the corresponding slides, we bias the model towards decoding new words from the slides by using a memory-enhanced ASR model from the literature. Then, we perform inference on the talk, collecting utterances that contain detected new words into an adaptation data set. Continual learning is then performed by training adaptation weights added to the model on this data set. The whole procedure is iterated for many talks. We show that with this approach, we obtain increasing performance on the new words when they occur more frequently (more than 80% recall) while preserving the general performance of the model.
Related papers
- Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
In this article, we tackle the challenge of developing ASR systems without paired speech and text corpora.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
This innovative model surpasses the performance of previous unsupervised ASR models under the lexicon-free setting.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - Improved Contextual Recognition In Automatic Speech Recognition Systems
By Semantic Lattice Rescoring [4.819085609772069]
We propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing.
Our solution consists of using Hidden Markov Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks (DNN) models for better accuracy.
We demonstrate the effectiveness of our proposed framework on the LibriSpeech dataset with empirical analyses.
arXiv Detail & Related papers (2023-10-14T23:16:05Z) - The Gift of Feedback: Improving ASR Model Quality by Learning from User
Corrections through Federated Learning [20.643270151774182]
We seek to continually learn from on-device user corrections through Federated Learning (FL)
We explore techniques to target fresh terms that the model has not previously encountered, learn long-tail words, and catastrophic forgetting.
In experimental evaluations, we find that the proposed techniques improve model recognition of fresh terms, while preserving quality on the overall language distribution.
arXiv Detail & Related papers (2023-09-29T21:04:10Z) - Online Continual Learning of End-to-End Speech Recognition Models [29.931427687979532]
Continual Learning aims to continually learn from new data as it becomes available.
We show that with online continual learning and a selective sampling strategy, we can maintain an accuracy similar to retraining a model from scratch.
arXiv Detail & Related papers (2022-07-11T05:35:06Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Short-Term Word-Learning in a Dynamically Changing Environment [63.025297637716534]
We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
arXiv Detail & Related papers (2022-03-29T10:05:39Z) - Instant One-Shot Word-Learning for Context-Specific Neural
Sequence-to-Sequence Speech Recognition [62.997667081978825]
We present an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize.
arXiv Detail & Related papers (2021-07-05T21:08:34Z) - Meta-Learning with Variational Semantic Memory for Word Sense
Disambiguation [56.830395467247016]
We propose a model of semantic memory for WSD in a meta-learning setting.
Our model is based on hierarchical variational inference and incorporates an adaptive memory update rule via a hypernetwork.
We show our model advances the state of the art in few-shot WSD, supports effective learning in extremely data scarce scenarios.
arXiv Detail & Related papers (2021-06-05T20:40:01Z) - Improving Proper Noun Recognition in End-to-End ASR By Customization of
the MWER Loss Criterion [33.043533068435366]
Proper nouns present a challenge for end-to-end (E2E) automatic speech recognition (ASR) systems.
Unlike conventional ASR models, E2E systems lack an explicit pronounciation model that can be specifically trained with proper noun pronounciations.
This paper builds on recent advances in minimum word error rate (MWER) training to develop two new loss criteria that specifically emphasize proper noun recognition.
arXiv Detail & Related papers (2020-05-19T21:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.