Related papers: An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding

An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding

URL: http://arxiv.org/abs/2211.08161v2
Date: Tue, 23 May 2023 10:04:47 GMT
Title: An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding
Authors: Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti
Abstract summary: We consider the joint use of rehearsal and knowledge distillation approaches for spoken language understanding under a class-incremental learning scenario. We report on multiple KD combinations at different levels in the network, showing that combining feature-level and predictions-level KDs leads to the best results.
Score: 9.447108578893639
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Continual learning refers to a dynamical framework in which a model receives a stream of non-stationary data over time and must adapt to new data while preserving previously acquired knowledge. Unluckily, neural networks fail to meet these two desiderata, incurring the so-called catastrophic forgetting phenomenon. Whereas a vast array of strategies have been proposed to attenuate forgetting in the computer vision domain, for speech-related tasks, on the other hand, there is a dearth of works. In this paper, we consider the joint use of rehearsal and knowledge distillation (KD) approaches for spoken language understanding under a class-incremental learning scenario. We report on multiple KD combinations at different levels in the network, showing that combining feature-level and predictions-level KDs leads to the best results. Finally, we provide an ablation study on the effect of the size of the rehearsal memory that corroborates the efficacy of our approach for low-resource devices.

Related papers

Weight Factorization and Centralization for Continual Learning in Speech Recognition [55.63455095283984]
Continually training the models in a rehearsal-free, multilingual, and language agnostic condition, likely leads to catastrophic forgetting.<n>Inspired by the ability of human brains to learn and consolidate knowledge through the waking-sleeping cycle, we propose a continual learning approach.
arXiv Detail & Related papers (2025-06-19T19:59:24Z)
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models [79.90523648823522]
Multi-stage continual learning can lead to catastrophic forgetting.<n>This paper evaluates three mitigation strategies-model merging, discounting the LoRA scaling factor, and experience replay.<n>Results show that experience replay is the most effective, with further gains achieved by combining it with other methods.
arXiv Detail & Related papers (2025-05-23T05:50:14Z)
Knowledge Graph Enhanced Generative Multi-modal Models for Class-Incremental Learning [51.0864247376786]
We introduce a Knowledge Graph Enhanced Generative Multi-modal model (KG-GMM) that builds an evolving knowledge graph throughout the learning process. During testing, we propose a Knowledge Graph Augmented Inference method that locates specific categories by analyzing relationships within the generated text.
arXiv Detail & Related papers (2025-03-24T07:20:43Z)
ReLearn: Unlearning via Learning for Large Language Models [64.2802606302194]
We propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning. This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation. Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output.
arXiv Detail & Related papers (2025-02-16T16:31:00Z)
CSTA: Spatial-Temporal Causal Adaptive Learning for Exemplar-Free Video Class-Incremental Learning [62.69917996026769]
A class-incremental learning task requires learning and preserving both spatial appearance and temporal action involvement. We propose a framework that equips separate adapters to learn new class patterns, accommodating the incremental information requirements unique to each class. A causal compensation mechanism is proposed to reduce the conflicts during increment and memorization for between different types of information.
arXiv Detail & Related papers (2025-01-13T11:34:55Z)
Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks. In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge. We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z)
Contrastive Augmentation: An Unsupervised Learning Approach for Keyword Spotting in Speech Technology [4.080686348274667]
We introduce a novel approach combining unsupervised contrastive learning and a augmentation unique-based technique. Our method allows the neural network to train on unlabeled data sets, potentially improving performance in downstream tasks. We present a speech augmentation-based unsupervised learning method that utilizes the similarity between the bottleneck layer feature and the audio reconstructing information.
arXiv Detail & Related papers (2024-08-31T05:40:37Z)
A Unified and General Framework for Continual Learning [58.72671755989431]
Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge. Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques. This research aims to bridge this gap by introducing a comprehensive and overarching framework that encompasses and reconciles these existing methodologies.
arXiv Detail & Related papers (2024-03-20T02:21:44Z)
Continual Contrastive Spoken Language Understanding [33.09005399967931]
COCONUT is a class-incremental learning (CIL) method that relies on the combination of experience replay and contrastive learning. We show that COCONUT can be combined with methods that operate on the decoder side of the model, resulting in further metrics improvements.
arXiv Detail & Related papers (2023-10-04T10:09:12Z)
Advancing continual lifelong learning in neural information retrieval: definition, dataset, framework, and empirical evaluation [3.2340528215722553]
A systematic task formulation of continual neural information retrieval is presented. A comprehensive continual neural information retrieval framework is proposed. Empirical evaluations illustrate that the proposed framework can successfully prevent catastrophic forgetting in neural information retrieval.
arXiv Detail & Related papers (2023-08-16T14:01:25Z)
Subspace Distillation for Continual Learning [27.22147868163214]
We propose a knowledge distillation technique that takes into account the manifold structure of a neural network in learning novel tasks. We demonstrate that the modeling with subspaces provides several intriguing properties, including robustness to noise. Empirically, we observe that our proposed method outperforms various continual learning methods on several challenging datasets.
arXiv Detail & Related papers (2023-07-31T05:59:09Z)
Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation [31.294947552032088]
Data-free Knowledge Distillation (DFKD) has gained popularity recently, with the fundamental idea of carrying out knowledge transfer from a Teacher to a Student neural network in the absence of training data. We propose a meta-learning inspired framework by treating the task of Knowledge-Acquisition (learning from newly generated samples) and Knowledge-Retention (retaining knowledge on previously met samples) as meta-train and meta-test.
arXiv Detail & Related papers (2023-02-28T03:50:56Z)
MCDAL: Maximum Classifier Discrepancy for Active Learning [74.73133545019877]
Recent state-of-the-art active learning methods have mostly leveraged Generative Adversarial Networks (GAN) for sample acquisition. We propose in this paper a novel active learning framework that we call Maximum Discrepancy for Active Learning (MCDAL) In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them.
arXiv Detail & Related papers (2021-07-23T06:57:08Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)
Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network. Our motivation is that models are expected to understand complex dynamics from different sources. Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z)
Neural Networks Enhancement with Logical Knowledge [83.9217787335878]
We propose an extension of KENN for relational data. The results show that KENN is capable of increasing the performances of the underlying neural network even in the presence relational data.
arXiv Detail & Related papers (2020-09-13T21:12:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.