Continual learning using lattice-free MMI for speech recognition
- URL: http://arxiv.org/abs/2110.07055v1
- Date: Wed, 13 Oct 2021 22:11:11 GMT
- Title: Continual learning using lattice-free MMI for speech recognition
- Authors: Hossein Hadian and Arseniy Gorin
- Abstract summary: Continual learning (CL) or domain expansion is a popular topic for automatic speech recognition (ASR) acoustic modeling.
Regularization-based CL for neural network acoustic models trained with the lattice-free maximum mutual information (LF-MMI) criterion is proposed.
We show that a sequence-level LWF can improve the best average word error rate across all domains by up to 9.4% relative compared with using regular LWF.
- Score: 6.802401545890963
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning (CL), or domain expansion, recently became a popular topic
for automatic speech recognition (ASR) acoustic modeling because practical
systems have to be updated frequently in order to work robustly on types of
speech not observed during initial training. While sequential adaptation allows
tuning a system to a new domain, it may result in performance degradation on
the old domains due to catastrophic forgetting. In this work we explore
regularization-based CL for neural network acoustic models trained with the
lattice-free maximum mutual information (LF-MMI) criterion. We simulate domain
expansion by incrementally adapting the acoustic model on different public
datasets that include several accents and speaking styles. We investigate two
well-known CL techniques, elastic weight consolidation (EWC) and learning
without forgetting (LWF), which aim to reduce forgetting by preserving model
weights or network outputs. We additionally introduce a sequence-level LWF
regularization, which exploits posteriors from the denominator graph of LF-MMI
to further reduce forgetting. Empirical results show that the proposed
sequence-level LWF can improve the best average word error rate across all
domains by up to 9.4% relative compared with using regular LWF.
Related papers
- Input layer regularization and automated regularization hyperparameter tuning for myelin water estimation using deep learning [1.9594393134885413]
We propose a novel deep learning method which combines classical regularization with data augmentation for estimating myelin water fraction (MWF) in the brain via biexponential analysis.
In particular, we study the biexponential model, one of the signal models used for MWF estimation.
arXiv Detail & Related papers (2025-01-30T00:56:28Z) - Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs [76.40876036912537]
Large Language Models (LLMs) demonstrate strong few-shot adaptability without requiring fine-tuning.
Current Visual Foundation Models (VFMs) require explicit fine-tuning with sufficient tuning data.
We propose a framework, LoRA Recycle, that distills a meta-LoRA from diverse pre-tuned LoRAs with a meta-learning objective.
arXiv Detail & Related papers (2024-12-03T07:25:30Z) - Recursive Learning of Asymptotic Variational Objectives [49.69399307452126]
General state-space models (SSMs) are widely used in statistical machine learning and are among the most classical generative models for sequential time-series data.
Online sequential IWAE (OSIWAE) allows for online learning of both model parameters and a Markovian recognition model for inferring latent states.
This approach is more theoretically well-founded than recently proposed online variational SMC methods.
arXiv Detail & Related papers (2024-11-04T16:12:37Z) - LLM-TS Integrator: Integrating LLM for Enhanced Time Series Modeling [5.853711797849859]
Time series(TS) modeling is essential in dynamic systems like weather prediction and anomaly detection.
Recent studies utilize Large Language Models (LLMs) for TS modeling, leveraging their powerful pattern recognition capabilities.
arXiv Detail & Related papers (2024-10-21T20:29:46Z) - Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks.
In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge.
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning [17.614980614656407]
We propose Continual Generative training for Incremental prompt-Learning.
We exploit Variational Autoencoders to learn class-conditioned distributions.
We show that such a generative replay approach can adapt to new tasks while improving zero-shot capabilities.
arXiv Detail & Related papers (2024-07-22T16:51:28Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates [71.81037644563217]
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning.
As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers.
We propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion.
arXiv Detail & Related papers (2024-03-27T09:14:36Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Generalized Variational Continual Learning [33.194866396158005]
Two main approaches to continuous learning are Online Elastic Weight Consolidation and Variational Continual Learning.
We show that applying this modification to mitigate Online EWC as a limiting case, allowing baselines between the two approaches.
In order to the observed overpruning effect of VI, we take inspiration from a common multi-task architecture, mitigate neural networks with task-specific FiLM layers.
arXiv Detail & Related papers (2020-11-24T19:07:39Z) - Frequency-based Automated Modulation Classification in the Presence of
Adversaries [17.930854969511046]
We present a novel receiver architecture consisting of deep learning models capable of withstanding transferable adversarial interference.
In this work, we demonstrate classification performance improvements greater than 30% on recurrent neural networks (RNNs) and greater than 50% on convolutional neural networks (CNNs)
arXiv Detail & Related papers (2020-11-02T17:12:22Z) - Early Stage LM Integration Using Local and Global Log-Linear Combination [46.91755970827846]
Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM)
One important factor to improve word error rate in both cases is the use of an external language model (LM) trained on large text-only corpora.
We present a novel method for language model integration into implicit-alignment based sequence-to-sequence models.
arXiv Detail & Related papers (2020-05-20T13:49:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.