Continual Learning via Bit-Level Information Preserving
- URL: http://arxiv.org/abs/2105.04444v1
- Date: Mon, 10 May 2021 15:09:01 GMT
- Title: Continual Learning via Bit-Level Information Preserving
- Authors: Yujun Shi, Li Yuan, Yunpeng Chen, Jiashi Feng
- Abstract summary: We study the continual learning process through the lens of information theory.
We propose Bit-Level Information Preserving (BLIP) that preserves the information gain on model parameters.
BLIP achieves close to zero forgetting while only requiring constant memory overheads throughout continual learning.
- Score: 88.32450740325005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning tackles the setting of learning different tasks
sequentially. Despite the lots of previous solutions, most of them still suffer
significant forgetting or expensive memory cost. In this work, targeted at
these problems, we first study the continual learning process through the lens
of information theory and observe that forgetting of a model stems from the
loss of \emph{information gain} on its parameters from the previous tasks when
learning a new task. From this viewpoint, we then propose a novel continual
learning approach called Bit-Level Information Preserving (BLIP) that preserves
the information gain on model parameters through updating the parameters at the
bit level, which can be conveniently implemented with parameter quantization.
More specifically, BLIP first trains a neural network with weight quantization
on the new incoming task and then estimates information gain on each parameter
provided by the task data to determine the bits to be frozen to prevent
forgetting. We conduct extensive experiments ranging from classification tasks
to reinforcement learning tasks, and the results show that our method produces
better or on par results comparing to previous state-of-the-arts. Indeed, BLIP
achieves close to zero forgetting while only requiring constant memory
overheads throughout continual learning.
Related papers
- Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - PIVOT: Prompting for Video Continual Learning [50.80141083993668]
We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain.
Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
arXiv Detail & Related papers (2022-12-09T13:22:27Z) - Learning Bayesian Sparse Networks with Full Experience Replay for
Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered.
Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal.
We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z) - DIODE: Dilatable Incremental Object Detection [15.59425584971872]
Conventional deep learning models lack the capability of preserving previously learned knowledge.
We propose a dilatable incremental object detector (DIODE) for multi-step incremental detection tasks.
Our method achieves up to 6.4% performance improvement by increasing the number of parameters by just 1.2% for each newly learned task.
arXiv Detail & Related papers (2021-08-12T09:45:57Z) - TAG: Task-based Accumulated Gradients for Lifelong learning [21.779858050277475]
We propose a task-aware system that adapts the learning rate based on the relatedness among tasks.
We empirically show that our proposed adaptive learning rate not only accounts for catastrophic forgetting but also allows positive backward transfer.
arXiv Detail & Related papers (2021-05-11T16:10:32Z) - Rectification-based Knowledge Retention for Continual Learning [49.1447478254131]
Deep learning models suffer from catastrophic forgetting when trained in an incremental learning setting.
We propose a novel approach to address the task incremental learning problem, which involves training a model on new tasks that arrive in an incremental manner.
Our approach can be used in both the zero-shot and non zero-shot task incremental learning settings.
arXiv Detail & Related papers (2021-03-30T18:11:30Z) - Gradient Projection Memory for Continual Learning [5.43185002439223]
The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems.
We propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for the past tasks.
arXiv Detail & Related papers (2021-03-17T16:31:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.