Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic
Forgetting in Automatic Speech Recognition
- URL: http://arxiv.org/abs/2210.15282v1
- Date: Thu, 27 Oct 2022 09:31:37 GMT
- Title: Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic
Forgetting in Automatic Speech Recognition
- Authors: Steven Vander Eeckt and Hugo Van hamme
- Abstract summary: Adapting a trained Automatic Speech Recognition model to new tasks results in catastrophic forgetting of old tasks.
We propose a simple yet effective method to overcome catastrophic forgetting: weight averaging.
We illustrate the effectiveness of our method on both monolingual and multilingual ASR.
- Score: 10.69755597323435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adapting a trained Automatic Speech Recognition (ASR) model to new tasks
results in catastrophic forgetting of old tasks, limiting the model's ability
to learn continually and to be extended to new speakers, dialects, languages,
etc. Focusing on End-to-End ASR, in this paper, we propose a simple yet
effective method to overcome catastrophic forgetting: weight averaging. By
simply taking the average of the previous and the adapted model, our method
achieves high performance on both the old and new tasks. It can be further
improved by introducing a knowledge distillation loss during the adaptation. We
illustrate the effectiveness of our method on both monolingual and multilingual
ASR. In both cases, our method strongly outperforms all baselines, even in its
simplest form.
Related papers
- Resource-Efficient Transfer Learning From Speech Foundation Model Using
Hierarchical Feature Fusion [44.056153052137674]
We propose a novel hierarchical feature fusion method for resource-efficient transfer learning from speech foundation models.
Experimental results show that the proposed method can achieve better performance on speech recognition task than existing algorithms.
arXiv Detail & Related papers (2022-11-04T19:03:45Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - Sequence-level self-learning with multiple hypotheses [53.04725240411895]
We develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR)
In contrast to conventional unsupervised learning approaches, we adopt the emphmulti-task learning (MTL) framework.
Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only.
arXiv Detail & Related papers (2021-12-10T20:47:58Z) - Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis [87.75833205560406]
This work presents a lifelong learning approach to train a multilingual Text-To-Speech (TTS) system.
It does not require pooled data from all languages altogether, and thus alleviates the storage and computation burden.
arXiv Detail & Related papers (2021-10-09T07:00:38Z) - An Effective Contextual Language Modeling Framework for Speech
Summarization with Augmented Features [13.97006782398121]
Bidirectional Representations from Transformers (BERT) model was proposed and has achieved record-breaking success on many natural language processing tasks.
We explore the incorporation of confidence scores into sentence representations to see if such an attempt could help alleviate the negative effects caused by imperfect automatic speech recognition.
We validate the effectiveness of our proposed method on a benchmark dataset.
arXiv Detail & Related papers (2020-06-01T18:27:48Z) - Incremental Learning for End-to-End Automatic Speech Recognition [41.297106772785206]
We propose an incremental learning method for end-to-end Automatic Speech Recognition (ASR)
We design a novel explainability-based knowledge distillation for ASR models, which is combined with a response-based knowledge distillation to maintain the original model's predictions and the "reason" for the predictions.
Results on a multi-stage sequential training task show that our method outperforms existing ones in mitigating forgetting.
arXiv Detail & Related papers (2020-05-11T08:18:08Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z) - Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less
Forgetting [66.45372974713189]
We propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks.
Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark.
We provide open-source RecAdam, which integrates the proposed mechanisms into Adam to facility the NLP community.
arXiv Detail & Related papers (2020-04-27T08:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.