An Investigation of the Combination of Rehearsal and Knowledge
Distillation in Continual Learning for Spoken Language Understanding
- URL: http://arxiv.org/abs/2211.08161v2
- Date: Tue, 23 May 2023 10:04:47 GMT
- Title: An Investigation of the Combination of Rehearsal and Knowledge
Distillation in Continual Learning for Spoken Language Understanding
- Authors: Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti
- Abstract summary: We consider the joint use of rehearsal and knowledge distillation approaches for spoken language understanding under a class-incremental learning scenario.
We report on multiple KD combinations at different levels in the network, showing that combining feature-level and predictions-level KDs leads to the best results.
- Score: 9.447108578893639
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning refers to a dynamical framework in which a model receives
a stream of non-stationary data over time and must adapt to new data while
preserving previously acquired knowledge. Unluckily, neural networks fail to
meet these two desiderata, incurring the so-called catastrophic forgetting
phenomenon. Whereas a vast array of strategies have been proposed to attenuate
forgetting in the computer vision domain, for speech-related tasks, on the
other hand, there is a dearth of works. In this paper, we consider the joint
use of rehearsal and knowledge distillation (KD) approaches for spoken language
understanding under a class-incremental learning scenario. We report on
multiple KD combinations at different levels in the network, showing that
combining feature-level and predictions-level KDs leads to the best results.
Finally, we provide an ablation study on the effect of the size of the
rehearsal memory that corroborates the efficacy of our approach for
low-resource devices.
Related papers
- A Unified and General Framework for Continual Learning [58.72671755989431]
Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge.
Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques.
This research aims to bridge this gap by introducing a comprehensive and overarching framework that encompasses and reconciles these existing methodologies.
arXiv Detail & Related papers (2024-03-20T02:21:44Z) - Continual Contrastive Spoken Language Understanding [33.09005399967931]
COCONUT is a class-incremental learning (CIL) method that relies on the combination of experience replay and contrastive learning.
We show that COCONUT can be combined with methods that operate on the decoder side of the model, resulting in further metrics improvements.
arXiv Detail & Related papers (2023-10-04T10:09:12Z) - Advancing continual lifelong learning in neural information retrieval: definition, dataset, framework, and empirical evaluation [3.2340528215722553]
A systematic task formulation of continual neural information retrieval is presented.
A comprehensive continual neural information retrieval framework is proposed.
Empirical evaluations illustrate that the proposed framework can successfully prevent catastrophic forgetting in neural information retrieval.
arXiv Detail & Related papers (2023-08-16T14:01:25Z) - Subspace Distillation for Continual Learning [27.22147868163214]
We propose a knowledge distillation technique that takes into account the manifold structure of a neural network in learning novel tasks.
We demonstrate that the modeling with subspaces provides several intriguing properties, including robustness to noise.
Empirically, we observe that our proposed method outperforms various continual learning methods on several challenging datasets.
arXiv Detail & Related papers (2023-07-31T05:59:09Z) - Learning to Retain while Acquiring: Combating Distribution-Shift in
Adversarial Data-Free Knowledge Distillation [31.294947552032088]
Data-free Knowledge Distillation (DFKD) has gained popularity recently, with the fundamental idea of carrying out knowledge transfer from a Teacher to a Student neural network in the absence of training data.
We propose a meta-learning inspired framework by treating the task of Knowledge-Acquisition (learning from newly generated samples) and Knowledge-Retention (retaining knowledge on previously met samples) as meta-train and meta-test.
arXiv Detail & Related papers (2023-02-28T03:50:56Z) - Task-Free Continual Learning via Online Discrepancy Distance Learning [11.540150938141034]
This paper develops a new theoretical analysis framework which provides generalization bounds based on the discrepancy distance between the visited samples and the entire information made available for training the model.
Inspired by this theoretical model, we propose a new approach enabled by the dynamic component expansion mechanism for a mixture model, namely the Online Discrepancy Distance Learning (ODDL)
arXiv Detail & Related papers (2022-10-12T20:44:09Z) - LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection.
Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch.
Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z) - MCDAL: Maximum Classifier Discrepancy for Active Learning [74.73133545019877]
Recent state-of-the-art active learning methods have mostly leveraged Generative Adversarial Networks (GAN) for sample acquisition.
We propose in this paper a novel active learning framework that we call Maximum Discrepancy for Active Learning (MCDAL)
In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them.
arXiv Detail & Related papers (2021-07-23T06:57:08Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network.
Our motivation is that models are expected to understand complex dynamics from different sources.
Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z) - Neural Networks Enhancement with Logical Knowledge [83.9217787335878]
We propose an extension of KENN for relational data.
The results show that KENN is capable of increasing the performances of the underlying neural network even in the presence relational data.
arXiv Detail & Related papers (2020-09-13T21:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.