Adaptively Integrated Knowledge Distillation and Prediction Uncertainty
for Continual Learning
- URL: http://arxiv.org/abs/2301.07316v1
- Date: Wed, 18 Jan 2023 05:36:06 GMT
- Title: Adaptively Integrated Knowledge Distillation and Prediction Uncertainty
for Continual Learning
- Authors: Kanghao Chen, Sijia Liu, Ruixuan Wang and Wei-Shi Zheng
- Abstract summary: Current deep learning models often suffer from catastrophic forgetting of old knowledge when continually learning new knowledge.
Existing strategies to alleviate this issue often fix the trade-off between keeping old knowledge (stability) and learning new knowledge (plasticity)
- Score: 71.43841235954453
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current deep learning models often suffer from catastrophic forgetting of old
knowledge when continually learning new knowledge. Existing strategies to
alleviate this issue often fix the trade-off between keeping old knowledge
(stability) and learning new knowledge (plasticity). However, the
stability-plasticity trade-off during continual learning may need to be
dynamically changed for better model performance. In this paper, we propose two
novel ways to adaptively balance model stability and plasticity. The first one
is to adaptively integrate multiple levels of old knowledge and transfer it to
each block level in the new model. The second one uses prediction uncertainty
of old knowledge to naturally tune the importance of learning new knowledge
during model training. To our best knowledge, this is the first time to connect
model prediction uncertainty and knowledge distillation for continual learning.
In addition, this paper applies a modified CutMix particularly to augment the
data for old knowledge, further alleviating the catastrophic forgetting issue.
Extensive evaluations on the CIFAR100 and the ImageNet datasets confirmed the
effectiveness of the proposed method for continual learning.
Related papers
- Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models [51.20499954955646]
Large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora during the pretraining phase.
In later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training.
We propose a two-stage fine-tuning strategy to improve the model's overall test accuracy and knowledge retention.
arXiv Detail & Related papers (2024-10-08T08:35:16Z) - Decision Boundary-aware Knowledge Consolidation Generates Better Instance-Incremental Learner [41.462673126500974]
Instance-incremental learning (IIL) focuses on learning continually with data of the same classes.
We propose a novel decision boundary-aware distillation method with consolidating knowledge to teacher to ease the student learning new knowledge.
arXiv Detail & Related papers (2024-06-05T08:49:51Z) - Forgetting before Learning: Utilizing Parametric Arithmetic for
Knowledge Updating in Large Language Models [53.52344131257681]
We propose a new paradigm for fine-tuning called F-Learning, which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge.
Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2023-11-14T09:12:40Z) - SRIL: Selective Regularization for Class-Incremental Learning [5.810252620242912]
Class-Incremental Learning aims to create an integrated model that balances plasticity and stability to overcome this challenge.
We propose a selective regularization method that accepts new knowledge while maintaining previous knowledge.
We validate the effectiveness of the proposed method through extensive experimental protocols using CIFAR-100, ImageNet-Subset, and ImageNet-Full.
arXiv Detail & Related papers (2023-05-09T05:04:35Z) - Anti-Retroactive Interference for Lifelong Learning [65.50683752919089]
We design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain.
It tackles the problem from two aspects: extracting knowledge and memorizing knowledge.
It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum.
arXiv Detail & Related papers (2022-08-27T09:27:36Z) - Continual Learning with Bayesian Model based on a Fixed Pre-trained
Feature Extractor [55.9023096444383]
Current deep learning models are characterised by catastrophic forgetting of old knowledge when learning new classes.
Inspired by the process of learning new knowledge in human brains, we propose a Bayesian generative model for continual learning.
arXiv Detail & Related papers (2022-04-28T08:41:51Z) - Addressing the Stability-Plasticity Dilemma via Knowledge-Aware
Continual Learning [5.979373021392084]
We show that being aware of existing knowledge helps in: (1) increasing the forward transfer from similar knowledge, (2) reducing the required capacity by leveraging existing knowledge, and (4) increasing robustness to the class order in the sequence.
We evaluate sequences of similar tasks, dissimilar tasks, and a mix of both constructed from the two commonly used benchmarks for class-incremental learning; CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2021-10-11T14:51:56Z) - Preserving Earlier Knowledge in Continual Learning with the Help of All
Previous Feature Extractors [63.21036904487014]
Continual learning of new knowledge over time is one desirable capability for intelligent systems to recognize more and more classes of objects.
We propose a simple yet effective fusion mechanism by including all the previously learned feature extractors into the intelligent model.
Experiments on multiple classification tasks show that the proposed approach can effectively reduce the forgetting of old knowledge, achieving state-of-the-art continual learning performance.
arXiv Detail & Related papers (2021-04-28T07:49:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.