A Closer Look at Rehearsal-Free Continual Learning
- URL: http://arxiv.org/abs/2203.17269v2
- Date: Mon, 3 Apr 2023 22:49:29 GMT
- Title: A Closer Look at Rehearsal-Free Continual Learning
- Authors: James Seale Smith, Junjiao Tian, Shaunak Halbe, Yen-Chang Hsu, Zsolt
Kira
- Abstract summary: We show how to achieve strong continual learning performance without rehearsal.
We first disprove the common assumption that parameter regularization techniques fail for rehearsal-free continual learning of a single, expanding task.
Next, we explore how to leverage knowledge from a pre-trained model in rehearsal-free continual learning and find that vanilla L2 parameter regularization outperforms EWC parameter regularization and feature distillation.
- Score: 26.09061715039747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning is a setting where machine learning models learn novel
concepts from continuously shifting training data, while simultaneously
avoiding degradation of knowledge on previously seen classes which may
disappear from the training data for extended periods of time (a phenomenon
known as the catastrophic forgetting problem). Current approaches for continual
learning of a single expanding task (aka class-incremental continual learning)
require extensive rehearsal of previously seen data to avoid this degradation
of knowledge. Unfortunately, rehearsal comes at a cost to memory, and it may
also violate data-privacy. Instead, we explore combining knowledge distillation
and parameter regularization in new ways to achieve strong continual learning
performance without rehearsal. Specifically, we take a deep dive into common
continual learning techniques: prediction distillation, feature distillation,
L2 parameter regularization, and EWC parameter regularization. We first
disprove the common assumption that parameter regularization techniques fail
for rehearsal-free continual learning of a single, expanding task. Next, we
explore how to leverage knowledge from a pre-trained model in rehearsal-free
continual learning and find that vanilla L2 parameter regularization
outperforms EWC parameter regularization and feature distillation. Finally, we
explore the recently popular ImageNet-R benchmark, and show that L2 parameter
regularization implemented in self-attention blocks of a ViT transformer
outperforms recent popular prompting for continual learning methods.
Related papers
- Learning Continually by Spectral Regularization [49.37215293091139]
Loss of plasticity is a phenomenon where neural networks become more difficult to train during the course of learning.
Continual learning algorithms seek to mitigate this effect by sustaining good predictive performance while maintaining network trainability.
arXiv Detail & Related papers (2024-06-10T21:34:43Z) - Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Contrastive Continual Learning with Importance Sampling and
Prototype-Instance Relation Distillation [14.25441464051506]
We propose Contrastive Continual Learning via Importance Sampling (CCLIS) to preserve knowledge by recovering previous data distributions.
We also present the Prototype-instance Relation Distillation (PRD) loss, a technique designed to maintain the relationship between prototypes and sample representations.
arXiv Detail & Related papers (2024-03-07T15:47:52Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Infinite dSprites for Disentangled Continual Learning: Separating Memory
Edits from Generalization [48.765079793320346]
We introduce Infinite dSprites, a parsimonious tool for creating continual classification benchmarks of arbitrary length.
We show that over a sufficiently long time horizon, the performance of all major types of continual learning methods deteriorates on this simple benchmark.
In a simple setting with direct supervision on the generative factors, we show how learning class-agnostic transformations offers a way to circumvent catastrophic forgetting.
arXiv Detail & Related papers (2023-12-27T22:05:42Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Continually Learning Self-Supervised Representations with Projected
Functional Regularization [39.92600544186844]
Recent self-supervised learning methods are able to learn high-quality image representations and are closing the gap with supervised methods.
These methods are unable to acquire new knowledge incrementally -- they are, in fact, mostly used only as a pre-training phase with IID data.
To prevent forgetting of previous knowledge, we propose the usage of functional regularization.
arXiv Detail & Related papers (2021-12-30T11:59:23Z) - Learning to Prompt for Continual Learning [34.609384246149325]
This work presents a new paradigm for continual learning that aims to train a more succinct memory system without accessing task identity at test time.
Our method learns to dynamically prompt (L2P) a pre-trained model to learn tasks sequentially under different task transitions.
The objective is to optimize prompts to instruct the model prediction and explicitly manage task-invariant and task-specific knowledge while maintaining model plasticity.
arXiv Detail & Related papers (2021-12-16T06:17:07Z) - Bilevel Continual Learning [76.50127663309604]
We present a novel framework of continual learning named "Bilevel Continual Learning" (BCL)
Our experiments on continual learning benchmarks demonstrate the efficacy of the proposed BCL compared to many state-of-the-art methods.
arXiv Detail & Related papers (2020-07-30T16:00:23Z) - Continual Learning with Node-Importance based Adaptive Group Sparse
Regularization [30.23319528662881]
We propose a novel regularization-based continual learning method, dubbed as Adaptive Group Sparsity based Continual Learning (AGS-CL)
Our method selectively employs the two penalties when learning each node based its the importance, which is adaptively updated after learning each new task.
arXiv Detail & Related papers (2020-03-30T18:21:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.