Preserving Plasticity in Continual Learning with Adaptive Linearity Injection
- URL: http://arxiv.org/abs/2505.09486v1
- Date: Wed, 14 May 2025 15:36:51 GMT
- Title: Preserving Plasticity in Continual Learning with Adaptive Linearity Injection
- Authors: Seyed Roozbeh Razavi Rohani, Khashayar Khajavi, Wesley Chung, Mo Chen, Sharan Vaswani,
- Abstract summary: Loss of plasticity in deep neural networks is the gradual reduction in a model's capacity to incrementally learn.<n>Recent work has shown that deep linear networks tend to be resilient towards loss of plasticity.<n>We propose Adaptive Linearization (AdaLin), a general approach that dynamically adapts each neuron's activation function to mitigate plasticity loss.
- Score: 10.641213440191551
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Loss of plasticity in deep neural networks is the gradual reduction in a model's capacity to incrementally learn and has been identified as a key obstacle to learning in non-stationary problem settings. Recent work has shown that deep linear networks tend to be resilient towards loss of plasticity. Motivated by this observation, we propose Adaptive Linearization (AdaLin), a general approach that dynamically adapts each neuron's activation function to mitigate plasticity loss. Unlike prior methods that rely on regularization or periodic resets, AdaLin equips every neuron with a learnable parameter and a gating mechanism that injects linearity into the activation function based on its gradient flow. This adaptive modulation ensures sufficient gradient signal and sustains continual learning without introducing additional hyperparameters or requiring explicit task boundaries. When used with conventional activation functions like ReLU, Tanh, and GeLU, we demonstrate that AdaLin can significantly improve performance on standard benchmarks, including Random Label and Permuted MNIST, Random Label and Shuffled CIFAR-10, and Class-Split CIFAR-100. Furthermore, its efficacy is shown in more complex scenarios, such as class-incremental learning on CIFAR-100 with a ResNet-18 backbone, and in mitigating plasticity loss in off-policy reinforcement learning agents. We perform a systematic set of ablations that show that neuron-level adaptation is crucial for good performance and analyze a number of metrics in the network that might be correlated to loss of plasticity.
Related papers
- EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z) - Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn [22.354498355750465]
We study the loss of plasticity in deep continual RL from the lens of churn.<n>We demonstrate that (1) the loss of plasticity is accompanied by the exacerbation of churn due to the gradual rank decrease of the Neural Tangent Kernel (NTK) matrix.<n>We introduce Continual Churn Approximated Reduction (C-CHAIN) and demonstrate it improves learning performance.
arXiv Detail & Related papers (2025-05-31T14:58:22Z) - Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning [122.67854581396578]
Plasticine is an open-source framework for benchmarking plasticity optimization in deep reinforcement learning.<n>Plasticine provides single-file implementations of over 13 mitigation methods, 10 evaluation metrics, and learning scenarios.
arXiv Detail & Related papers (2025-04-24T12:32:13Z) - Activation by Interval-wise Dropout: A Simple Way to Prevent Neural Networks from Plasticity Loss [3.841822016067955]
Plasticity loss limits a model's ability to adapt to new tasks or shifts in data distribution.<n>This paper introduces AID (Activation by Interval-wise Dropout), a novel method inspired by Dropout to address plasticity loss.<n>We show that AID regularizes the network, promoting behavior analogous to that of deep linear networks, which do not suffer from plasticity loss.
arXiv Detail & Related papers (2025-02-03T13:34:53Z) - Disentangling the Causes of Plasticity Loss in Neural Networks [55.23250269007988]
We show that loss of plasticity can be decomposed into multiple independent mechanisms.
We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks.
arXiv Detail & Related papers (2024-02-29T00:02:33Z) - ENN: A Neural Network with DCT Adaptive Activation Functions [2.2713084727838115]
We present Expressive Neural Network (ENN), a novel model in which the non-linear activation functions are modeled using the Discrete Cosine Transform (DCT)
This parametrization keeps the number of trainable parameters low, is appropriate for gradient-based schemes, and adapts to different learning tasks.
The performance of ENN outperforms state of the art benchmarks, providing above a 40% gap in accuracy in some scenarios.
arXiv Detail & Related papers (2023-07-02T21:46:30Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness [172.61581010141978]
Certifiable robustness is a desirable property for adopting deep neural networks (DNNs) in safety-critical scenarios.
We propose a novel solution to strategically manipulate neurons, by "grafting" appropriate levels of linearity.
arXiv Detail & Related papers (2022-06-15T22:42:29Z) - NN-EUCLID: deep-learning hyperelasticity without stress data [0.0]
We propose a new approach for unsupervised learning of hyperelastic laws with physics-consistent deep neural networks.
In contrast to supervised learning, which assumes the stress-strain, the approach only uses realistically measurable full-elastic field displacement and global force availability data.
arXiv Detail & Related papers (2022-05-04T13:54:54Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - SpikePropamine: Differentiable Plasticity in Spiking Neural Networks [0.0]
We introduce a framework for learning the dynamics of synaptic plasticity and neuromodulated synaptic plasticity in Spiking Neural Networks (SNNs)
We show that SNNs augmented with differentiable plasticity are sufficient for solving a set of challenging temporal learning tasks.
These networks are also shown to be capable of producing locomotion on a high-dimensional robotic learning task.
arXiv Detail & Related papers (2021-06-04T19:29:07Z) - Robust Learning via Persistency of Excitation [4.674053902991301]
We show that network training using gradient descent is equivalent to a dynamical system parameter estimation problem.
We provide an efficient technique for estimating the corresponding Lipschitz constant using extreme value theory.
Our approach also universally increases the adversarial accuracy by 0.1% to 0.3% points in various state-of-the-art adversarially trained models.
arXiv Detail & Related papers (2021-06-03T18:49:05Z) - Towards Efficient Processing and Learning with Spikes: New Approaches
for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks.
In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented.
Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.