Continual learning using lattice-free MMI for speech recognition
- URL: http://arxiv.org/abs/2110.07055v1
- Date: Wed, 13 Oct 2021 22:11:11 GMT
- Title: Continual learning using lattice-free MMI for speech recognition
- Authors: Hossein Hadian and Arseniy Gorin
- Abstract summary: Continual learning (CL) or domain expansion is a popular topic for automatic speech recognition (ASR) acoustic modeling.
Regularization-based CL for neural network acoustic models trained with the lattice-free maximum mutual information (LF-MMI) criterion is proposed.
We show that a sequence-level LWF can improve the best average word error rate across all domains by up to 9.4% relative compared with using regular LWF.
- Score: 6.802401545890963
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning (CL), or domain expansion, recently became a popular topic
for automatic speech recognition (ASR) acoustic modeling because practical
systems have to be updated frequently in order to work robustly on types of
speech not observed during initial training. While sequential adaptation allows
tuning a system to a new domain, it may result in performance degradation on
the old domains due to catastrophic forgetting. In this work we explore
regularization-based CL for neural network acoustic models trained with the
lattice-free maximum mutual information (LF-MMI) criterion. We simulate domain
expansion by incrementally adapting the acoustic model on different public
datasets that include several accents and speaking styles. We investigate two
well-known CL techniques, elastic weight consolidation (EWC) and learning
without forgetting (LWF), which aim to reduce forgetting by preserving model
weights or network outputs. We additionally introduce a sequence-level LWF
regularization, which exploits posteriors from the denominator graph of LF-MMI
to further reduce forgetting. Empirical results show that the proposed
sequence-level LWF can improve the best average word error rate across all
domains by up to 9.4% relative compared with using regular LWF.
Related papers
- Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - Sequential Editing for Lifelong Training of Speech Recognition Models [10.770491329674401]
Fine-tuning solely on new domain risks Catastrophic Forgetting (CF)
We propose Sequential Model Editing as a novel method to continually learn new domains in ASR systems.
Our study demonstrates up to 15% Word Error Rate Reduction (WERR) over fine-tuning baseline, and superior efficiency over other LLL techniques on CommonVoice English multi-accent dataset.
arXiv Detail & Related papers (2024-06-25T20:52:09Z) - Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates [71.81037644563217]
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning.
As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers.
We propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion.
arXiv Detail & Related papers (2024-03-27T09:14:36Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - A Correspondence Variational Autoencoder for Unsupervised Acoustic Word
Embeddings [50.524054820564395]
We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation.
The resulting acoustic word embeddings can form the basis of search, discovery, and indexing systems for low- and zero-resource languages.
arXiv Detail & Related papers (2020-12-03T19:24:42Z) - Generalized Variational Continual Learning [33.194866396158005]
Two main approaches to continuous learning are Online Elastic Weight Consolidation and Variational Continual Learning.
We show that applying this modification to mitigate Online EWC as a limiting case, allowing baselines between the two approaches.
In order to the observed overpruning effect of VI, we take inspiration from a common multi-task architecture, mitigate neural networks with task-specific FiLM layers.
arXiv Detail & Related papers (2020-11-24T19:07:39Z) - Frequency-based Automated Modulation Classification in the Presence of
Adversaries [17.930854969511046]
We present a novel receiver architecture consisting of deep learning models capable of withstanding transferable adversarial interference.
In this work, we demonstrate classification performance improvements greater than 30% on recurrent neural networks (RNNs) and greater than 50% on convolutional neural networks (CNNs)
arXiv Detail & Related papers (2020-11-02T17:12:22Z) - Early Stage LM Integration Using Local and Global Log-Linear Combination [46.91755970827846]
Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM)
One important factor to improve word error rate in both cases is the use of an external language model (LM) trained on large text-only corpora.
We present a novel method for language model integration into implicit-alignment based sequence-to-sequence models.
arXiv Detail & Related papers (2020-05-20T13:49:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.