Related papers: Activation Function Design Sustains Plasticity in Continual Learning

Activation Function Design Sustains Plasticity in Continual Learning

URL: http://arxiv.org/abs/2509.22562v1
Date: Fri, 26 Sep 2025 16:41:47 GMT
Title: Activation Function Design Sustains Plasticity in Continual Learning
Authors: Lute Lillo, Nick Cheney,
Abstract summary: In continual learning, models can progressively lose the ability to adapt.<n>We show that activation choice is a primary, architecture-agnostic lever for mitigating plasticity loss.
Score: 1.618563064839635
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In independent, identically distributed (i.i.d.) training regimes, activation functions have been benchmarked extensively, and their differences often shrink once model size and optimization are tuned. In continual learning, however, the picture is different: beyond catastrophic forgetting, models can progressively lose the ability to adapt (referred to as loss of plasticity) and the role of the non-linearity in this failure mode remains underexplored. We show that activation choice is a primary, architecture-agnostic lever for mitigating plasticity loss. Building on a property-level analysis of negative-branch shape and saturation behavior, we introduce two drop-in nonlinearities (Smooth-Leaky and Randomized Smooth-Leaky) and evaluate them in two complementary settings: (i) supervised class-incremental benchmarks and (ii) reinforcement learning with non-stationary MuJoCo environments designed to induce controlled distribution and dynamics shifts. We also provide a simple stress protocol and diagnostics that link the shape of the activation to the adaptation under change. The takeaway is straightforward: thoughtful activation design offers a lightweight, domain-general way to sustain plasticity in continual learning without extra capacity or task-specific tuning.

Related papers

Regime Change Hypothesis: Foundations for Decoupled Dynamics in Neural Network Training [1.0518862318418603]
In ReLU-based models, the activation pattern induced by a given input determines the piecewise-linear region in which the network behaves affinely.<n>We investigate whether training exhibits a two-timescale behavior: an early stage with substantial changes in activation patterns and a later stage where weight updates predominantly refine the model.
arXiv Detail & Related papers (2026-02-09T07:14:28Z)
Degradation of Feature Space in Continual Learning [2.322400467239964]
We investigate whether promoting feature-space isotropy can enhance representation quality in continual learning.<n>We find that isotropic regularization fails to improve, and can in fact degrade, model accuracy in continual settings.
arXiv Detail & Related papers (2026-02-06T10:26:34Z)
Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories [58.988535279557546]
We introduce textbf sycophancy Mitigation through Adaptive Reasoning Trajectories.<n>We show that SMART significantly reduces sycophantic behavior while preserving strong performance on out-of-distribution inputs.
arXiv Detail & Related papers (2025-09-20T17:09:14Z)
Balancing Expressivity and Robustness: Constrained Rational Activations for Reinforcement Learning [9.120944934920141]
We study trainable rational activations in both reinforcement and continual learning settings.<n>Our main result is demonstrating a trade-off between expressivity and plasticity in rational activations.<n>Our findings provide actionable design principles for robust and trainable activations in dynamic, nonstationary environments.
arXiv Detail & Related papers (2025-07-19T19:53:08Z)
Data-Driven Exploration for a Class of Continuous-Time Indefinite Linear--Quadratic Reinforcement Learning Problems [6.859965454961918]
We study reinforcement learning for continuous-time linear-quadratic (LQ) control problems.<n>We propose a model-free, data-driven exploration mechanism that adaptively adjusts entropy regularization by the critic.<n>Our method achieves a sublinear regret bound that matches the best-known model-free results for this class of LQ problems.
arXiv Detail & Related papers (2025-07-01T01:09:06Z)
The Importance of Being Lazy: Scaling Limits of Continual Learning [60.97756735877614]
We show that increasing model width is only beneficial when it reduces the amount of feature learning, yielding more laziness.<n>We study the intricate relationship between feature learning, task non-stationarity, and forgetting, finding that high feature learning is only beneficial with highly similar tasks.
arXiv Detail & Related papers (2025-06-20T10:12:38Z)
SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting [68.00007494819798]
Continual learning requires a model to learn multiple tasks in sequence while maintaining both stability:preserving knowledge from previously learned tasks, and plasticity:effectively learning new tasks.<n> Gradient projection has emerged as an effective and popular paradigm in CL, where it partitions the gradient space of previously learned tasks into two subspaces.<n>New tasks are learned effectively within the minor subspace, thereby reducing interference with previously acquired knowledge.<n>Existing Gradient Projection methods struggle to achieve an optimal balance between plasticity and stability, as it is hard to appropriately partition the gradient space.
arXiv Detail & Related papers (2025-05-28T13:57:56Z)
Preserving Plasticity in Continual Learning with Adaptive Linearity Injection [10.641213440191551]
Loss of plasticity in deep neural networks is the gradual reduction in a model's capacity to incrementally learn.<n>Recent work has shown that deep linear networks tend to be resilient towards loss of plasticity.<n>We propose Adaptive Linearization (AdaLin), a general approach that dynamically adapts each neuron's activation function to mitigate plasticity loss.
arXiv Detail & Related papers (2025-05-14T15:36:51Z)
Training Dynamics of Nonlinear Contrastive Learning Model in the High Dimensional Limit [1.7597525104451157]
An empirical distribution of the model weights converges to a deterministic measure governed by a McKean-Vlasov nonlinear partial differential equation (PDE) Under L2 regularization, this PDE reduces to a closed set of low-dimensional ordinary differential equations (ODEs) We analyze the fixed point locations and their stability of the ODEs unveiling several interesting findings.
arXiv Detail & Related papers (2024-06-11T03:07:41Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
On the Stability-Plasticity Dilemma of Class-Incremental Learning [50.863180812727244]
A primary goal of class-incremental learning is to strike a balance between stability and plasticity. This paper aims to shed light on how effectively recent class-incremental learning algorithms address the stability-plasticity trade-off.
arXiv Detail & Related papers (2023-04-04T09:34:14Z)
FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories. We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z)
Supervised Learning in the Presence of Concept Drift: A modelling framework [5.22609266390809]
We present a modelling framework for the investigation of supervised learning in non-stationary environments. We model two example types of learning systems: prototype-based Learning Vector Quantization (LVQ) for classification and shallow, layered neural networks for regression tasks.
arXiv Detail & Related papers (2020-05-21T09:13:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.