Continual Learning for Monolingual End-to-End Automatic Speech
Recognition
- URL: http://arxiv.org/abs/2112.09427v1
- Date: Fri, 17 Dec 2021 10:47:17 GMT
- Title: Continual Learning for Monolingual End-to-End Automatic Speech
Recognition
- Authors: Steven Vander Eeckt and Hugo Van hamme
- Abstract summary: Adapting Automatic Speech Recognition (ASR) models to new domains leads to a deterioration of performance on the original domain(s)
Even monolingual ASR models cannot be extended to new accents, dialects, topics, etc. without suffering from Catastrophic Forgetting (CF)
- Score: 16.651146574124567
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adapting Automatic Speech Recognition (ASR) models to new domains leads to a
deterioration of performance on the original domain(s), a phenomenon called
Catastrophic Forgetting (CF). Even monolingual ASR models cannot be extended to
new accents, dialects, topics, etc. without suffering from CF, making them
unable to be continually enhanced without storing all past data. Fortunately,
Continual Learning (CL) methods, which aim to enable continual adaptation while
overcoming CF, can be used. In this paper, we implement an extensive number of
CL methods for End-to-End ASR and test and compare their ability to extend a
monolingual Hybrid CTC-Transformer model across four new tasks. We find that
the best performing CL method closes the gap between the fine-tuned model
(lower bound) and the model trained jointly on all tasks (upper bound) by more
than 40%, while requiring access to only 0.6% of the original data.
Related papers
- Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - Embracing Language Inclusivity and Diversity in CLIP through Continual
Language Learning [58.92843729869586]
Vision-language pre-trained models (VL-PTMs) have advanced multimodal research in recent years, but their mastery in a few languages like English restricts their applicability in broader communities.
We propose to extend VL-PTMs' language capacity by continual language learning (CLL), where a model needs to update its linguistic knowledge incrementally without suffering from catastrophic forgetting (CF)
We construct a CLL benchmark covering 36 languages based on MSCOCO and XM3600 datasets and then evaluate multilingual image-text retrieval performance.
arXiv Detail & Related papers (2024-01-30T17:14:05Z) - Improving Language Plasticity via Pretraining with Active Forgetting [63.36484652568976]
We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages.
Experiments with RoBERTa show that models pretrained with our forgetting mechanism demonstrate faster convergence during language adaptation.
arXiv Detail & Related papers (2023-07-03T17:12:44Z) - Multilingual Contextual Adapters To Improve Custom Word Recognition In
Low-resource Languages [3.7870350845913165]
We study Contextual Adapters, wherein an attention-based biasing model for CTC is used to improve the recognition of custom entities.
In this work, we propose a supervision loss for smoother training of the Contextual Adapters.
Our method achieves 48% F1 improvement in retrieving unseen custom entities for a low-resource language.
arXiv Detail & Related papers (2023-07-03T05:29:38Z) - Preventing Zero-Shot Transfer Degradation in Continual Learning of
Vision-Language Models [13.340759455910721]
We propose a novel method to prevent zero-shot transfer degradation in the continual learning of vision-language models.
Our method outperforms other methods in the traditional class-incremental learning setting.
arXiv Detail & Related papers (2023-03-12T10:28:07Z) - Improving Massively Multilingual ASR With Auxiliary CTC Objectives [40.10307386370194]
We introduce our work on improving performance on FLEURS, a 102-language open ASR benchmark.
We investigate techniques inspired from recent Connectionist Temporal Classification ( CTC) studies to help the model handle the large number of languages.
Our state-of-the-art systems using self-supervised models with the Conformer architecture improve over the results of prior work on FLEURS by a relative 28.4% CER.
arXiv Detail & Related papers (2023-02-24T18:59:51Z) - Hyperparameter-free Continuous Learning for Domain Classification in
Natural Language Understanding [60.226644697970116]
Domain classification is the fundamental task in natural language understanding (NLU)
Most existing continual learning approaches suffer from low accuracy and performance fluctuation.
We propose a hyper parameter-free continual learning model for text data that can stably produce high performance under various environments.
arXiv Detail & Related papers (2022-01-05T02:46:16Z) - Continual learning using lattice-free MMI for speech recognition [6.802401545890963]
Continual learning (CL) or domain expansion is a popular topic for automatic speech recognition (ASR) acoustic modeling.
Regularization-based CL for neural network acoustic models trained with the lattice-free maximum mutual information (LF-MMI) criterion is proposed.
We show that a sequence-level LWF can improve the best average word error rate across all domains by up to 9.4% relative compared with using regular LWF.
arXiv Detail & Related papers (2021-10-13T22:11:11Z) - Learning Adaptive Embedding Considering Incremental Class [55.21855842960139]
Class-Incremental Learning (CIL) aims to train a reliable model with the streaming data, which emerges unknown classes sequentially.
Different from traditional closed set learning, CIL has two main challenges: 1) Novel class detection.
After the novel classes are detected, the model needs to be updated without re-training using entire previous data.
arXiv Detail & Related papers (2020-08-31T04:11:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.