Related papers: Sequential Editing for Lifelong Training of Speech Recognition Models

Sequential Editing for Lifelong Training of Speech Recognition Models

URL: http://arxiv.org/abs/2406.17935v1
Date: Tue, 25 Jun 2024 20:52:09 GMT
Title: Sequential Editing for Lifelong Training of Speech Recognition Models
Authors: Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Nikolaos Pappas, Srikanth Ronanki,
Abstract summary: Fine-tuning solely on new domain risks Catastrophic Forgetting (CF) We propose Sequential Model Editing as a novel method to continually learn new domains in ASR systems. Our study demonstrates up to 15% Word Error Rate Reduction (WERR) over fine-tuning baseline, and superior efficiency over other LLL techniques on CommonVoice English multi-accent dataset.
Score: 10.770491329674401
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automatic Speech Recognition (ASR) traditionally assumes known domains, but adding data from a new domain raises concerns about computational inefficiencies linked to retraining models on both existing and new domains. Fine-tuning solely on new domain risks Catastrophic Forgetting (CF). To address this, Lifelong Learning (LLL) algorithms have been proposed for ASR. Prior research has explored techniques such as Elastic Weight Consolidation, Knowledge Distillation, and Replay, all of which necessitate either additional parameters or access to prior domain data. We propose Sequential Model Editing as a novel method to continually learn new domains in ASR systems. Different than previous methods, our approach does not necessitate access to prior datasets or the introduction of extra parameters. Our study demonstrates up to 15% Word Error Rate Reduction (WERR) over fine-tuning baseline, and superior efficiency over other LLL techniques on CommonVoice English multi-accent dataset.

Related papers

EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z)
Enhancing knowledge retention for continual learning with domain-specific adapters and features gating [4.637185817866919]
Continual learning empowers models to learn from a continuous stream of data while preserving previously acquired knowledge. We propose a new approach that integrates adapters within the self-attention mechanisms of Vision Transformers to enhance knowledge retention when sequentially adding datasets from different domains.
arXiv Detail & Related papers (2025-04-11T15:20:08Z)
Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation [73.9145653659403]
We show that Generative Error Correction models struggle to generalize beyond the specific types of errors encountered during training. We propose DARAG, a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios. Our approach is simple, scalable, and both domain- and language-agnostic.
arXiv Detail & Related papers (2024-10-17T04:00:29Z)
Understanding the Cross-Domain Capabilities of Video-Based Few-Shot Action Recognition Models [3.072340427031969]
Few-shot action recognition (FSAR) aims to learn a model capable of identifying novel actions in videos using only a few examples. In assuming the base dataset seen during meta-training and novel dataset used for evaluation can come from different domains, cross-domain few-shot learning alleviates data collection and annotation costs. We systematically evaluate existing state-of-the-art single-domain, transfer-based, and cross-domain FSAR methods on new cross-domain tasks.
arXiv Detail & Related papers (2024-06-03T07:48:18Z)
Continuously Learning New Words in Automatic Speech Recognition [56.972851337263755]
We propose an self-supervised continual learning approach to recognize new words. We use a memory-enhanced Automatic Speech Recognition model from previous work. We show that with this approach, we obtain increasing performance on the new words when they occur more frequently.
arXiv Detail & Related papers (2024-01-09T10:39:17Z)
On Generalizing Beyond Domains in Cross-Domain Continual Learning [91.56748415975683]
Deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task. Our proposed approach learns new tasks under domain shift with accuracy boosts up to 10% on challenging datasets such as DomainNet and OfficeHome.
arXiv Detail & Related papers (2022-03-08T09:57:48Z)
Hyperparameter-free Continuous Learning for Domain Classification in Natural Language Understanding [60.226644697970116]
Domain classification is the fundamental task in natural language understanding (NLU) Most existing continual learning approaches suffer from low accuracy and performance fluctuation. We propose a hyper parameter-free continual learning model for text data that can stably produce high performance under various environments.
arXiv Detail & Related papers (2022-01-05T02:46:16Z)
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning [53.32740707197856]
We present a new state-of-the-art unsupervised method based on pre-trained Transformers and Sequential Denoising Auto-Encoder (TSDAE) It can achieve up to 93.1% of the performance of in-domain supervised approaches.
arXiv Detail & Related papers (2021-04-14T17:02:18Z)
Flexible deep transfer learning by separate feature embeddings and manifold alignment [0.0]
Object recognition is a key enabler across industry and defense. Unfortunately, algorithms trained on existing labeled datasets do not directly generalize to new data because the data distributions do not match. We propose a novel deep learning framework that overcomes this limitation by learning separate feature extractions for each domain.
arXiv Detail & Related papers (2020-12-22T19:24:44Z)
Continual Learning for Natural Language Generation in Task-oriented Dialog Systems [72.92029584113676]
Natural language generation (NLG) is an essential component of task-oriented dialog systems. We study NLG in a "continual learning" setting to expand its knowledge to new domains or functionalities incrementally. The major challenge towards this goal is catastrophic forgetting, meaning that a continually trained model tends to forget the knowledge it has learned before.
arXiv Detail & Related papers (2020-10-02T10:32:29Z)
Data Techniques For Online End-to-end Speech Recognition [17.621967685914587]
Practitioners often need to build ASR systems for new use cases in a short amount of time, given limited in-domain data. While recently developed end-to-end methods largely simplify the modeling pipelines, they still suffer from the data sparsity issue. We explore a few simple-to-implement techniques for building online ASR systems in an end-to-end fashion, with a small amount of transcribed data in the target domain.
arXiv Detail & Related papers (2020-01-24T22:59:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.