Gradient Projection Memory for Continual Learning
- URL: http://arxiv.org/abs/2103.09762v1
- Date: Wed, 17 Mar 2021 16:31:29 GMT
- Title: Gradient Projection Memory for Continual Learning
- Authors: Gobinda Saha, Isha Garg, Kaushik Roy
- Abstract summary: The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems.
We propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for the past tasks.
- Score: 5.43185002439223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to learn continually without forgetting the past tasks is a
desired attribute for artificial learning systems. Existing approaches to
enable such learning in artificial neural networks usually rely on network
growth, importance based weight update or replay of old data from the memory.
In contrast, we propose a novel approach where a neural network learns new
tasks by taking gradient steps in the orthogonal direction to the gradient
subspaces deemed important for the past tasks. We find the bases of these
subspaces by analyzing network representations (activations) after learning
each task with Singular Value Decomposition (SVD) in a single shot manner and
store them in the memory as Gradient Projection Memory (GPM). With qualitative
and quantitative analyses, we show that such orthogonal gradient descent
induces minimum to no interference with the past tasks, thereby mitigates
forgetting. We evaluate our algorithm on diverse image classification datasets
with short and long sequences of tasks and report better or on-par performance
compared to the state-of-the-art approaches.
Related papers
- Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Look-Ahead Selective Plasticity for Continual Learning of Visual Tasks [9.82510084910641]
We propose a new mechanism that takes place during task boundaries, i.e., when one task finishes and another starts.
We evaluate the proposed methods on benchmark computer vision datasets including CIFAR10 and TinyImagenet.
arXiv Detail & Related papers (2023-11-02T22:00:23Z) - Continual Learning with Scaled Gradient Projection [8.847574864259391]
In neural networks, continual learning results in gradient interference among sequential tasks, leading to forgetting of old tasks while learning new ones.
We propose a Scaled Gradient Projection (SGP) method to improve new learning while minimizing forgetting.
We conduct experiments ranging from continual image classification to reinforcement learning tasks and report better performance with less training overhead than the state-of-the-art approaches.
arXiv Detail & Related papers (2023-02-02T19:46:39Z) - On Generalizing Beyond Domains in Cross-Domain Continual Learning [91.56748415975683]
Deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task.
Our proposed approach learns new tasks under domain shift with accuracy boosts up to 10% on challenging datasets such as DomainNet and OfficeHome.
arXiv Detail & Related papers (2022-03-08T09:57:48Z) - Learning Bayesian Sparse Networks with Full Experience Replay for
Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered.
Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal.
We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z) - Reducing Catastrophic Forgetting in Self Organizing Maps with
Internally-Induced Generative Replay [67.50637511633212]
A lifelong learning agent is able to continually learn from potentially infinite streams of pattern sensory data.
One major historic difficulty in building agents that adapt is that neural systems struggle to retain previously-acquired knowledge when learning from new samples.
This problem is known as catastrophic forgetting (interference) and remains an unsolved problem in the domain of machine learning to this day.
arXiv Detail & Related papers (2021-12-09T07:11:14Z) - Center Loss Regularization for Continual Learning [0.0]
In general, neural networks lack the ability to learn different tasks sequentially.
Our approach remembers old tasks by projecting the representations of new tasks close to that of old tasks.
We demonstrate that our approach is scalable, effective, and gives competitive performance compared to state-of-the-art continual learning methods.
arXiv Detail & Related papers (2021-10-21T17:46:44Z) - Dynamic Neural Diversification: Path to Computationally Sustainable
Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks.
We explore the diversity of the neurons within the hidden layer during the learning process.
We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z) - Continual Learning via Bit-Level Information Preserving [88.32450740325005]
We study the continual learning process through the lens of information theory.
We propose Bit-Level Information Preserving (BLIP) that preserves the information gain on model parameters.
BLIP achieves close to zero forgetting while only requiring constant memory overheads throughout continual learning.
arXiv Detail & Related papers (2021-05-10T15:09:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.