Online Continual Learning of End-to-End Speech Recognition Models
- URL: http://arxiv.org/abs/2207.05071v1
- Date: Mon, 11 Jul 2022 05:35:06 GMT
- Title: Online Continual Learning of End-to-End Speech Recognition Models
- Authors: Muqiao Yang, Ian Lane, Shinji Watanabe
- Abstract summary: Continual Learning aims to continually learn from new data as it becomes available.
We show that with online continual learning and a selective sampling strategy, we can maintain an accuracy similar to retraining a model from scratch.
- Score: 29.931427687979532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual Learning, also known as Lifelong Learning, aims to continually
learn from new data as it becomes available. While prior research on continual
learning in automatic speech recognition has focused on the adaptation of
models across multiple different speech recognition tasks, in this paper we
propose an experimental setting for \textit{online continual learning} for
automatic speech recognition of a single task. Specifically focusing on the
case where additional training data for the same task becomes available
incrementally over time, we demonstrate the effectiveness of performing
incremental model updates to end-to-end speech recognition models with an
online Gradient Episodic Memory (GEM) method. Moreover, we show that with
online continual learning and a selective sampling strategy, we can maintain an
accuracy that is similar to retraining a model from scratch while requiring
significantly lower computation costs. We have also verified our method with
self-supervised learning (SSL) features.
Related papers
- Continual Learning in Machine Speech Chain Using Gradient Episodic Memory [9.473861847584843]
This paper introduces a novel approach leveraging the machine speech chain framework to enable continual learning in ASR.
By incorporating a text-to-speech (TTS) component within the machine speech chain, we support the replay mechanism essential for GEM.
Our experiments, conducted on the LJ Speech dataset, demonstrate that our method outperforms traditional fine-tuning and multitask learning approaches.
arXiv Detail & Related papers (2024-11-27T13:19:20Z) - Continuously Learning New Words in Automatic Speech Recognition [56.972851337263755]
We propose a self-supervised continual learning approach for Automatic Speech Recognition.
We use a memory-enhanced ASR model from the literature to decode new words from the slides.
We show that with this approach, we obtain increasing performance on the new words when they occur more frequently.
arXiv Detail & Related papers (2024-01-09T10:39:17Z) - ILASR: Privacy-Preserving Incremental Learning for AutomaticSpeech
Recognition at Production Scale [19.524894956258343]
This paper uses a cloud based framework for production systems to demonstrate insights from privacy preserving incremental learning for automatic speech recognition (ILASR)
We show that the proposed system can improve the production models significantly(3%) over a new time period of six months even in the absence of human annotated labels.
arXiv Detail & Related papers (2022-07-19T05:24:13Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Self-Supervised Learning for speech recognition with Intermediate layer
supervision [52.93758711230248]
We propose Intermediate Layer Supervision for Self-Supervised Learning (ILS-SSL)
ILS-SSL forces the model to concentrate on content information as much as possible by adding an additional SSL loss on the intermediate layers.
Experiments on LibriSpeech test-other set show that our method outperforms HuBERT significantly.
arXiv Detail & Related papers (2021-12-16T10:45:05Z) - WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
Processing [102.45426364965887]
We propose a new pre-trained model, WavLM, to solve full-stack downstream speech tasks.
WavLM is built based on the HuBERT framework, with an emphasis on both spoken content modeling and speaker identity preservation.
We scale up the training dataset from 60k hours to 94k hours of public audio data, and optimize its training procedure for better representation extraction.
arXiv Detail & Related papers (2021-10-26T17:55:19Z) - UniSpeech-SAT: Universal Speech Representation Learning with Speaker
Aware Pre-Training [72.004873454347]
Two methods are introduced for enhancing the unsupervised speaker information extraction.
Experiment results on SUPERB benchmark show that the proposed system achieves state-of-the-art performance.
We scale up training dataset to 94 thousand hours public audio data and achieve further performance improvement.
arXiv Detail & Related papers (2021-10-12T05:43:30Z) - Online Continual Learning with Natural Distribution Shifts: An Empirical
Study with Visual Data [101.6195176510611]
"Online" continual learning enables evaluating both information retention and online learning efficacy.
In online continual learning, each incoming small batch of data is first used for testing and then added to the training set, making the problem truly online.
We introduce a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
arXiv Detail & Related papers (2021-08-20T06:17:20Z) - Continual-wav2vec2: an Application of Continual Learning for
Self-Supervised Automatic Speech Recognition [0.23872611575805824]
We present a method for continual learning of speech representations for multiple languages using self-supervised learning (SSL)
Wav2vec models perform SSL on raw audio in a pretraining phase and then finetune on a small fraction of annotated data.
We use ideas from continual learning to transfer knowledge from a previous task to speed up pretraining a new language task.
arXiv Detail & Related papers (2021-07-26T10:39:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.