Natural continual learning: success is a journey, not (just) a
destination
- URL: http://arxiv.org/abs/2106.08085v1
- Date: Tue, 15 Jun 2021 12:24:53 GMT
- Title: Natural continual learning: success is a journey, not (just) a
destination
- Authors: Ta-Chu Kao, Kristopher T. Jensen, Alberto Bernacchia, Guillaume
Hennequin
- Abstract summary: Natural Continual Learning (NCL) is a new method that unifies weight regularization and projected gradient descent.
Our method outperforms both standard weight regularization techniques and projection based approaches when applied to continual learning problems in RNNs.
The trained networks evolve task-specific dynamics that are strongly preserved as new tasks are learned, similar to experimental findings in biological circuits.
- Score: 9.462808515258464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Biological agents are known to learn many different tasks over the course of
their lives, and to be able to revisit previous tasks and behaviors with little
to no loss in performance. In contrast, artificial agents are prone to
'catastrophic forgetting' whereby performance on previous tasks deteriorates
rapidly as new ones are acquired. This shortcoming has recently been addressed
using methods that encourage parameters to stay close to those used for
previous tasks. This can be done by (i) using specific parameter regularizers
that map out suitable destinations in parameter space, or (ii) guiding the
optimization journey by projecting gradients into subspaces that do not
interfere with previous tasks. However, parameter regularization has been shown
to be relatively ineffective in recurrent neural networks (RNNs), a setting
relevant to the study of neural dynamics supporting biological continual
learning. Similarly, projection based methods can reach capacity and fail to
learn any further as the number of tasks increases. To address these
limitations, we propose Natural Continual Learning (NCL), a new method that
unifies weight regularization and projected gradient descent. NCL uses Bayesian
weight regularization to encourage good performance on all tasks at convergence
and combines this with gradient projections designed to prevent catastrophic
forgetting during optimization. NCL formalizes gradient projection as a trust
region algorithm based on the Fisher information metric, and achieves
scalability via a novel Kronecker-factored approximation strategy. Our method
outperforms both standard weight regularization techniques and projection based
approaches when applied to continual learning problems in RNNs. The trained
networks evolve task-specific dynamics that are strongly preserved as new tasks
are learned, similar to experimental findings in biological circuits.
Related papers
- CODE-CL: COnceptor-Based Gradient Projection for DEep Continual Learning [7.573297026523597]
We introduce COnceptor-based gradient projection for DEep Continual Learning (CODE-CL)
CODE-CL encodes directional importance within the input space of past tasks, allowing new knowledge integration in directions modulated by $1-S$.
We analyze task overlap using conceptor-based representations to identify highly correlated tasks.
arXiv Detail & Related papers (2024-11-21T22:31:06Z) - Gradient-Free Training of Recurrent Neural Networks using Random Perturbations [1.1742364055094265]
Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities.
Backpropagation through time (BPTT), the prevailing method, extends the backpropagation algorithm by unrolling the RNN over time.
BPTT suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information.
We present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT.
arXiv Detail & Related papers (2024-05-14T21:15:29Z) - Continual Learning via Sequential Function-Space Variational Inference [65.96686740015902]
We propose an objective derived by formulating continual learning as sequential function-space variational inference.
Compared to objectives that directly regularize neural network predictions, the proposed objective allows for more flexible variational distributions.
We demonstrate that, across a range of task sequences, neural networks trained via sequential function-space variational inference achieve better predictive accuracy than networks trained with related methods.
arXiv Detail & Related papers (2023-12-28T18:44:32Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Continual Learning with Pretrained Backbones by Tuning in the Input
Space [44.97953547553997]
The intrinsic difficulty in adapting deep learning models to non-stationary environments limits the applicability of neural networks to real-world tasks.
We propose a novel strategy to make the fine-tuning procedure more effective, by avoiding to update the pre-trained part of the network and learning not only the usual classification head, but also a set of newly-introduced learnable parameters.
arXiv Detail & Related papers (2023-06-05T15:11:59Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Continual Learning with Scaled Gradient Projection [8.847574864259391]
In neural networks, continual learning results in gradient interference among sequential tasks, leading to forgetting of old tasks while learning new ones.
We propose a Scaled Gradient Projection (SGP) method to improve new learning while minimizing forgetting.
We conduct experiments ranging from continual image classification to reinforcement learning tasks and report better performance with less training overhead than the state-of-the-art approaches.
arXiv Detail & Related papers (2023-02-02T19:46:39Z) - Learning Bayesian Sparse Networks with Full Experience Replay for
Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered.
Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal.
We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z) - Continual Deep Learning by Functional Regularisation of Memorable Past [95.97578574330934]
Continually learning new skills is important for intelligent systems, yet standard deep learning methods suffer from catastrophic forgetting of the past.
We propose a new functional-regularisation approach that utilises a few memorable past examples crucial to avoid forgetting.
Our method achieves state-of-the-art performance on standard benchmarks and opens a new direction for life-long learning where regularisation and memory-based methods are naturally combined.
arXiv Detail & Related papers (2020-04-29T10:47:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.