Related papers: Maintaining Plasticity in Deep Continual Learning

Maintaining Plasticity in Deep Continual Learning

URL: http://arxiv.org/abs/2306.13812v3
Date: Tue, 9 Apr 2024 21:01:56 GMT
Title: Maintaining Plasticity in Deep Continual Learning
Authors: Shibhansh Dohare, J. Fernando Hernandez-Garcia, Parash Rahman, A. Rupam Mahmood, Richard S. Sutton,
Abstract summary: We provide demonstrations of loss of plasticity using datasets repurposed for continual learning as sequences of tasks. In ImageNet, binary classification performance dropped from 89% accuracy on an early task down to 77%. New algorithm -- continual backpropagation -- modifies conventional backpropagation to re-use less-used units after each example.
Score: 12.27972591521307
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Modern deep-learning systems are specialized to problem settings in which training occurs once and then never again, as opposed to continual-learning settings in which training occurs continually. If deep-learning systems are applied in a continual learning setting, then it is well known that they may fail to remember earlier examples. More fundamental, but less well known, is that they may also lose their ability to learn on new examples, a phenomenon called loss of plasticity. We provide direct demonstrations of loss of plasticity using the MNIST and ImageNet datasets repurposed for continual learning as sequences of tasks. In ImageNet, binary classification performance dropped from 89% accuracy on an early task down to 77%, about the level of a linear network, on the 2000th task. Loss of plasticity occurred with a wide range of deep network architectures, optimizers, activation functions, batch normalization, dropout, but was substantially eased by L2-regularization, particularly when combined with weight perturbation. Further, we introduce a new algorithm -- continual backpropagation -- which slightly modifies conventional backpropagation to reinitialize a small fraction of less-used units after each example and appears to maintain plasticity indefinitely.

Related papers

Spectral Imbalance Causes Forgetting in Low-Rank Continual Adaptation [58.3773038915023]
Continual learning aims to adapt pre-trained models to sequential tasks without forgetting previously acquired knowledge.<n>Most existing approaches treat continual learning as avoiding interference with past updates, rather than considering what properties make the current task-specific update naturally preserve previously acquired knowledge.<n>We address this problem using a projected first-order method compatible with standard deep-dots used in vision-language models.
arXiv Detail & Related papers (2026-01-31T13:27:02Z)
AltNet: Addressing the Plasticity-Stability Dilemma in Reinforcement Learning [2.1487266204344473]
We introduce AltNet, a reset-based approach that restores plasticity without performance degradation by leveraging twin networks.<n>We demonstrate these advantages in several high-dimensional control tasks from the DeepMind Control Suite.
arXiv Detail & Related papers (2025-11-30T19:02:20Z)
Spectral Collapse Drives Loss of Plasticity in Deep Continual Learning [14.196969540084929]
We show that deep neural networks suffer from loss of plasticity in deep continual learning.<n>We introduce the notion of $tau$-trainability and show that current plasticity preserving algorithms can be unified under this framework.<n> Experiments on continual supervised and reinforcement learning tasks confirm that combining these two regularizers effectively preserves plasticity.
arXiv Detail & Related papers (2025-09-26T13:28:53Z)
Reinitializing weights vs units for maintaining plasticity in neural networks [6.404696914681301]
Loss of plasticity is a phenomenon in which a neural network loses its ability to learn when trained for an extended time on non-stationary data.<n>An effective technique for preventing loss of plasticity is reinitializing parts of the network.<n>We propose a new algorithm, which we name textitselective weight reinitialization, for reinitializing the least useful weights in a network.
arXiv Detail & Related papers (2025-07-31T23:25:19Z)
DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of Plasticity [11.624569521079426]
We develop a framework emulating real-world neural network training and identify noise memorization as the primary cause of plasticity loss when warm-starting on stationary data. Motivated by this, we propose Direction-Aware SHrinking (DASH), a method aiming to mitigate plasticity loss by selectively forgetting noise while preserving learned features.
arXiv Detail & Related papers (2024-10-30T22:57:54Z)
Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature. We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate. We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z)
Disentangling the Causes of Plasticity Loss in Neural Networks [55.23250269007988]
We show that loss of plasticity can be decomposed into multiple independent mechanisms. We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks.
arXiv Detail & Related papers (2024-02-29T00:02:33Z)
Negotiated Representations to Prevent Forgetting in Machine Learning Applications [0.0]
Catastrophic forgetting is a significant challenge in the field of machine learning. We propose a novel method for preventing catastrophic forgetting in machine learning applications.
arXiv Detail & Related papers (2023-11-30T22:43:50Z)
IF2Net: Innately Forgetting-Free Networks for Continual Learning [49.57495829364827]
Continual learning can incrementally absorb new concepts without interfering with previously learned knowledge. Motivated by the characteristics of neural networks, we investigated how to design an Innately Forgetting-Free Network (IF2Net) IF2Net allows a single network to inherently learn unlimited mapping rules without telling task identities at test time.
arXiv Detail & Related papers (2023-06-18T05:26:49Z)
Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models. We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers. A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z)
An Empirical Investigation of the Role of Pre-training in Lifelong Learning [21.995593026269578]
We show that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima.
arXiv Detail & Related papers (2021-12-16T19:00:55Z)
Flattening Sharpness for Dynamic Gradient Projection Memory Benefits Continual Learning [67.99349091593324]
We investigate the relationship between the weight loss landscape and sensitivity-stability in the continual learning scenario. Our proposed method consistently outperforms baselines with the superior ability to learn new skills while alleviating forgetting effectively.
arXiv Detail & Related papers (2021-10-09T15:13:44Z)
Essentials for Class Incremental Learning [43.306374557919646]
Class-incremental learning results on CIFAR-100 and ImageNet improve over the state-of-the-art by a large margin, while keeping the approach simple.
arXiv Detail & Related papers (2021-02-18T18:01:06Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.