A Simple Baseline for Stable and Plastic Neural Networks
- URL: http://arxiv.org/abs/2507.10637v2
- Date: Fri, 18 Jul 2025 08:54:22 GMT
- Title: A Simple Baseline for Stable and Plastic Neural Networks
- Authors: Étienne Künzel, Achref Jaziri, Visvanathan Ramesh,
- Abstract summary: Continual learning in computer vision requires that models adapt to a continuous stream of tasks without forgetting prior knowledge.<n>We introduce RDBP, a low-overhead baseline that unites two complementary mechanisms: ReLUDown, a lightweight activation modification that preserves feature sensitivity while preventing neuron dormancy, and Decreasing Backpropagation, a biologically inspired gradient-scheduling scheme that progressively shields early layers from catastrophic updates.
- Score: 3.2635082758250693
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning in computer vision requires that models adapt to a continuous stream of tasks without forgetting prior knowledge, yet existing approaches often tip the balance heavily toward either plasticity or stability. We introduce RDBP, a simple, low-overhead baseline that unites two complementary mechanisms: ReLUDown, a lightweight activation modification that preserves feature sensitivity while preventing neuron dormancy, and Decreasing Backpropagation, a biologically inspired gradient-scheduling scheme that progressively shields early layers from catastrophic updates. Evaluated on the Continual ImageNet benchmark, RDBP matches or exceeds the plasticity and stability of state-of-the-art methods while reducing computational cost. RDBP thus provides both a practical solution for real-world continual learning and a clear benchmark against which future continual learning strategies can be measured.
Related papers
- Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn [22.354498355750465]
We study the loss of plasticity in deep continual RL from the lens of churn.<n>We demonstrate that (1) the loss of plasticity is accompanied by the exacerbation of churn due to the gradual rank decrease of the Neural Tangent Kernel (NTK) matrix.<n>We introduce Continual Churn Approximated Reduction (C-CHAIN) and demonstrate it improves learning performance.
arXiv Detail & Related papers (2025-05-31T14:58:22Z) - Preserving Plasticity in Continual Learning with Adaptive Linearity Injection [10.641213440191551]
Loss of plasticity in deep neural networks is the gradual reduction in a model's capacity to incrementally learn.<n>Recent work has shown that deep linear networks tend to be resilient towards loss of plasticity.<n>We propose Adaptive Linearization (AdaLin), a general approach that dynamically adapts each neuron's activation function to mitigate plasticity loss.
arXiv Detail & Related papers (2025-05-14T15:36:51Z) - Memory Is Not the Bottleneck: Cost-Efficient Continual Learning via Weight Space Consolidation [55.77835198580209]
Continual learning (CL) has traditionally emphasized minimizing exemplar memory usage, assuming that memory is the primary bottleneck.<n>This paper re-examines CL under a more realistic setting with sufficient exemplar memory, where the system can retain a representative portion of past data.<n>We find that, under this regime, stability improves due to reduced forgetting, but plasticity diminishes as the model becomes biased toward prior tasks and struggles to adapt to new ones.
arXiv Detail & Related papers (2025-02-11T05:40:52Z) - Oja's plasticity rule overcomes several challenges of training neural networks under biological constraints [0.0]
We show that incorporating Oja's plasticity rule into error-driven training yields stable, efficient learning in feedforward and recurrent architectures.<n>Our results show that Oja's rule preserves richer activation subspaces, mitigates exploding or vanishing signals, and improves short-term memory in recurrent networks.
arXiv Detail & Related papers (2024-08-15T20:26:47Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages [56.98243487769916]
Plasticity, the ability of a neural network to evolve with new data, is crucial for high-performance and sample-efficient visual reinforcement learning.
We propose Adaptive RR, which dynamically adjusts the replay ratio based on the critic's plasticity level.
arXiv Detail & Related papers (2023-10-11T12:05:34Z) - Keep Moving: identifying task-relevant subspaces to maximise plasticity for newly learned tasks [0.22499166814992438]
Continual learning algorithms strive to acquire new knowledge while preserving prior information.
Often, these algorithms emphasise stability and restrict network updates upon learning new tasks.
But is all change detrimental?
We propose that activation spaces in neural networks can be decomposed into two subspaces.
arXiv Detail & Related papers (2023-10-07T08:54:43Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Improving Performance in Continual Learning Tasks using Bio-Inspired
Architectures [4.2903672492917755]
We develop a biologically inspired lightweight neural network architecture that incorporates synaptic plasticity mechanisms and neuromodulation.
Our approach leads to superior online continual learning performance on Split-MNIST, Split-CIFAR-10, and Split-CIFAR-100 datasets.
We further demonstrate the effectiveness of our approach by integrating key design concepts into other backpropagation-based continual learning algorithms.
arXiv Detail & Related papers (2023-08-08T19:12:52Z) - Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks
in Continual Learning [23.15206507040553]
We propose Auxiliary Network Continual Learning (ANCL) to equip the neural network with the ability to learn the current task.
ANCL applies an additional auxiliary network which promotes plasticity to the continually learned model which mainly focuses on stability.
More concretely, the proposed framework materializes in a regularizer that naturally interpolates between plasticity and stability.
arXiv Detail & Related papers (2023-03-16T17:00:42Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - Limited-angle tomographic reconstruction of dense layered objects by
dynamical machine learning [68.9515120904028]
Limited-angle tomography of strongly scattering quasi-transparent objects is a challenging, highly ill-posed problem.
Regularizing priors are necessary to reduce artifacts by improving the condition of such problems.
We devised a recurrent neural network (RNN) architecture with a novel split-convolutional gated recurrent unit (SC-GRU) as the building block.
arXiv Detail & Related papers (2020-07-21T11:48:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.