Related papers: Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

URL: http://arxiv.org/abs/2203.06514v1
Date: Sat, 12 Mar 2022 21:12:41 GMT
Title: Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations
Authors: Ali Abbasi, Parsa Nooralinejad, Vladimir Braverman, Hamed Pirsiavash, Soheil Kolouri
Abstract summary: Continual/lifelong learning from a non-stationary input data stream is a cornerstone of intelligence. Deep neural networks are prone to forgetting their previously learned information upon learning new ones. Overcoming catastrophic forgetting in deep neural networks has become an active field of research in recent years.
Score: 36.24028295650668
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Continual/lifelong learning from a non-stationary input data stream is a cornerstone of intelligence. Despite their phenomenal performance in a wide variety of applications, deep neural networks are prone to forgetting their previously learned information upon learning new ones. This phenomenon is called "catastrophic forgetting" and is deeply rooted in the stability-plasticity dilemma. Overcoming catastrophic forgetting in deep neural networks has become an active field of research in recent years. In particular, gradient projection-based methods have recently shown exceptional performance at overcoming catastrophic forgetting. This paper proposes two biologically-inspired mechanisms based on sparsity and heterogeneous dropout that significantly increase a continual learner's performance over a long sequence of tasks. Our proposed approach builds on the Gradient Projection Memory (GPM) framework. We leverage K-winner activations in each layer of a neural network to enforce layer-wise sparse activations for each task, together with a between-task heterogeneous dropout that encourages the network to use non-overlapping activation patterns between different tasks. In addition, we introduce Continual Swiss Roll as a lightweight and interpretable -- yet challenging -- synthetic benchmark for continual learning. Lastly, we provide an in-depth analysis of our proposed method and demonstrate a significant performance boost on various benchmark continual learning problems.

Related papers

How Weight Resampling and Optimizers Shape the Dynamics of Continual Learning and Forgetting in Neural Networks [2.270857464465579]
Recent work in continual learning has highlighted the beneficial effect of resampling weights in the last layer of a neural network (zapping)<n>We investigate in detail the pattern of learning and forgetting that take place inside a convolutional neural network when trained in challenging settings.
arXiv Detail & Related papers (2025-07-02T10:18:35Z)
Disentangling the Causes of Plasticity Loss in Neural Networks [55.23250269007988]
We show that loss of plasticity can be decomposed into multiple independent mechanisms. We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks.
arXiv Detail & Related papers (2024-02-29T00:02:33Z)
Elephant Neural Networks: Born to Be a Continual Learner [7.210328077827388]
Catastrophic forgetting remains a significant challenge to continual learning for decades. We study the role of activation functions in the training dynamics of neural networks and their impact on catastrophic forgetting. We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting.
arXiv Detail & Related papers (2023-10-02T17:27:39Z)
Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training. We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z)
Learning Bayesian Sparse Networks with Full Experience Replay for Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered. Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal. We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z)
FFNB: Forgetting-Free Neural Blocks for Deep Continual Visual Learning [14.924672048447338]
We devise a dynamic network architecture for continual learning based on a novel forgetting-free neural block (FFNB) Training FFNB features on new tasks is achieved using a novel procedure that constrains the underlying parameters in the null-space of the previous tasks.
arXiv Detail & Related papers (2021-11-22T17:23:34Z)
Wide Neural Networks Forget Less Catastrophically [39.907197907411266]
We study the impact of "width" of the neural network architecture on catastrophic forgetting. We study the learning dynamics of the network from various perspectives.
arXiv Detail & Related papers (2021-10-21T23:49:23Z)
Gradient Projection Memory for Continual Learning [5.43185002439223]
The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems. We propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for the past tasks.
arXiv Detail & Related papers (2021-03-17T16:31:29Z)
Understanding the Role of Training Regimes in Continual Learning [51.32945003239048]
Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially. We study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks' local minima.
arXiv Detail & Related papers (2020-06-12T06:00:27Z)
Towards Efficient Processing and Learning with Spikes: New Approaches for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks. In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented. Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z)
Learn2Perturb: an End-to-end Feature Perturbation Learning to Improve Adversarial Robustness [79.47619798416194]
Learn2Perturb is an end-to-end feature perturbation learning approach for improving the adversarial robustness of deep neural networks. Inspired by the Expectation-Maximization, an alternating back-propagation training algorithm is introduced to train the network and noise parameters consecutively.
arXiv Detail & Related papers (2020-03-02T18:27:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.