An Investigation of the Combination of Rehearsal and Knowledge
Distillation in Continual Learning for Spoken Language Understanding
- URL: http://arxiv.org/abs/2211.08161v2
- Date: Tue, 23 May 2023 10:04:47 GMT
- Title: An Investigation of the Combination of Rehearsal and Knowledge
Distillation in Continual Learning for Spoken Language Understanding
- Authors: Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti
- Abstract summary: We consider the joint use of rehearsal and knowledge distillation approaches for spoken language understanding under a class-incremental learning scenario.
We report on multiple KD combinations at different levels in the network, showing that combining feature-level and predictions-level KDs leads to the best results.
- Score: 9.447108578893639
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning refers to a dynamical framework in which a model receives
a stream of non-stationary data over time and must adapt to new data while
preserving previously acquired knowledge. Unluckily, neural networks fail to
meet these two desiderata, incurring the so-called catastrophic forgetting
phenomenon. Whereas a vast array of strategies have been proposed to attenuate
forgetting in the computer vision domain, for speech-related tasks, on the
other hand, there is a dearth of works. In this paper, we consider the joint
use of rehearsal and knowledge distillation (KD) approaches for spoken language
understanding under a class-incremental learning scenario. We report on
multiple KD combinations at different levels in the network, showing that
combining feature-level and predictions-level KDs leads to the best results.
Finally, we provide an ablation study on the effect of the size of the
rehearsal memory that corroborates the efficacy of our approach for
low-resource devices.
Related papers
- Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks.
In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge.
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - Contrastive Augmentation: An Unsupervised Learning Approach for Keyword Spotting in Speech Technology [4.080686348274667]
We introduce a novel approach combining unsupervised contrastive learning and a augmentation unique-based technique.
Our method allows the neural network to train on unlabeled data sets, potentially improving performance in downstream tasks.
We present a speech augmentation-based unsupervised learning method that utilizes the similarity between the bottleneck layer feature and the audio reconstructing information.
arXiv Detail & Related papers (2024-08-31T05:40:37Z) - A Unified and General Framework for Continual Learning [58.72671755989431]
Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge.
Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques.
This research aims to bridge this gap by introducing a comprehensive and overarching framework that encompasses and reconciles these existing methodologies.
arXiv Detail & Related papers (2024-03-20T02:21:44Z) - Continual Contrastive Spoken Language Understanding [33.09005399967931]
COCONUT is a class-incremental learning (CIL) method that relies on the combination of experience replay and contrastive learning.
We show that COCONUT can be combined with methods that operate on the decoder side of the model, resulting in further metrics improvements.
arXiv Detail & Related papers (2023-10-04T10:09:12Z) - Advancing continual lifelong learning in neural information retrieval: definition, dataset, framework, and empirical evaluation [3.2340528215722553]
A systematic task formulation of continual neural information retrieval is presented.
A comprehensive continual neural information retrieval framework is proposed.
Empirical evaluations illustrate that the proposed framework can successfully prevent catastrophic forgetting in neural information retrieval.
arXiv Detail & Related papers (2023-08-16T14:01:25Z) - Subspace Distillation for Continual Learning [27.22147868163214]
We propose a knowledge distillation technique that takes into account the manifold structure of a neural network in learning novel tasks.
We demonstrate that the modeling with subspaces provides several intriguing properties, including robustness to noise.
Empirically, we observe that our proposed method outperforms various continual learning methods on several challenging datasets.
arXiv Detail & Related papers (2023-07-31T05:59:09Z) - Learning to Retain while Acquiring: Combating Distribution-Shift in
Adversarial Data-Free Knowledge Distillation [31.294947552032088]
Data-free Knowledge Distillation (DFKD) has gained popularity recently, with the fundamental idea of carrying out knowledge transfer from a Teacher to a Student neural network in the absence of training data.
We propose a meta-learning inspired framework by treating the task of Knowledge-Acquisition (learning from newly generated samples) and Knowledge-Retention (retaining knowledge on previously met samples) as meta-train and meta-test.
arXiv Detail & Related papers (2023-02-28T03:50:56Z) - MCDAL: Maximum Classifier Discrepancy for Active Learning [74.73133545019877]
Recent state-of-the-art active learning methods have mostly leveraged Generative Adversarial Networks (GAN) for sample acquisition.
We propose in this paper a novel active learning framework that we call Maximum Discrepancy for Active Learning (MCDAL)
In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them.
arXiv Detail & Related papers (2021-07-23T06:57:08Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network.
Our motivation is that models are expected to understand complex dynamics from different sources.
Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z) - Neural Networks Enhancement with Logical Knowledge [83.9217787335878]
We propose an extension of KENN for relational data.
The results show that KENN is capable of increasing the performances of the underlying neural network even in the presence relational data.
arXiv Detail & Related papers (2020-09-13T21:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.