Efficient Reinforcement Learning by Reducing Forgetting with Elephant Activation Functions
- URL: http://arxiv.org/abs/2509.19159v1
- Date: Tue, 23 Sep 2025 15:38:01 GMT
- Title: Efficient Reinforcement Learning by Reducing Forgetting with Elephant Activation Functions
- Authors: Qingfeng Lan, Gautham Vasan, A. Rupam Mahmood,
- Abstract summary: Catastrophic forgetting has remained a significant challenge for efficient reinforcement learning for decades.<n>We study the role of activation functions in the training dynamics of neural networks and their impact on catastrophic forgetting.<n>We show that by replacing classical activation functions with elephant activation functions in the neural networks of value-based algorithms, we can significantly improve the resilience of neural networks to catastrophic forgetting.
- Score: 10.730038944035838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Catastrophic forgetting has remained a significant challenge for efficient reinforcement learning for decades (Ring 1994, Rivest and Precup 2003). While recent works have proposed effective methods to mitigate this issue, they mainly focus on the algorithmic side. Meanwhile, we do not fully understand what architectural properties of neural networks lead to catastrophic forgetting. This study aims to fill this gap by studying the role of activation functions in the training dynamics of neural networks and their impact on catastrophic forgetting in reinforcement learning setup. Our study reveals that, besides sparse representations, the gradient sparsity of activation functions also plays an important role in reducing forgetting. Based on this insight, we propose a new class of activation functions, elephant activation functions, that can generate both sparse outputs and sparse gradients. We show that by simply replacing classical activation functions with elephant activation functions in the neural networks of value-based algorithms, we can significantly improve the resilience of neural networks to catastrophic forgetting, thus making reinforcement learning more sample-efficient and memory-efficient.
Related papers
- The Importance of Being Lazy: Scaling Limits of Continual Learning [60.97756735877614]
We show that increasing model width is only beneficial when it reduces the amount of feature learning, yielding more laziness.<n>We study the intricate relationship between feature learning, task non-stationarity, and forgetting, finding that high feature learning is only beneficial with highly similar tasks.
arXiv Detail & Related papers (2025-06-20T10:12:38Z) - Task-Specific Activation Functions for Neuroevolution using Grammatical Evolution [0.0]
We introduce Neuvo GEAF - an innovative approach leveraging grammatical evolution (GE) to automatically evolve novel activation functions.<n>Experiments conducted on well-known binary classification datasets show statistically significant improvements in F1-score (between 2.4% and 9.4%) over ReLU.
arXiv Detail & Related papers (2025-03-13T20:50:21Z) - $ε$-rank and the Staircase Phenomenon: New Insights into Neural Network Training Dynamics [1.7056144431280509]
We introduce the concept of $epsilon$-rank, a novel metric quantifying the effective feature of neuron functions in the terminal hidden layer.<n>We prove a negative correlation between the loss lower bound and $epsilon$-rank, demonstrating that a high $epsilon$-rank is essential for significant loss reduction.<n>We propose a novel pre-training strategy on the initial hidden layer that elevates the $epsilon$-rank of the terminal hidden layer.
arXiv Detail & Related papers (2024-12-06T16:00:50Z) - Elephant Neural Networks: Born to Be a Continual Learner [7.210328077827388]
Catastrophic forgetting remains a significant challenge to continual learning for decades.
We study the role of activation functions in the training dynamics of neural networks and their impact on catastrophic forgetting.
We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting.
arXiv Detail & Related papers (2023-10-02T17:27:39Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Learning Bayesian Sparse Networks with Full Experience Replay for
Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered.
Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal.
We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z) - Advantages of biologically-inspired adaptive neural activation in RNNs
during learning [10.357949759642816]
We introduce a novel parametric family of nonlinear activation functions inspired by input-frequency response curves of biological neurons.
We find that activation adaptation provides distinct task-specific solutions and in some cases, improves both learning speed and performance.
arXiv Detail & Related papers (2020-06-22T13:49:52Z) - Untangling tradeoffs between recurrence and self-attention in neural
networks [81.30894993852813]
We present a formal analysis of how self-attention affects gradient propagation in recurrent networks.
We prove that it mitigates the problem of vanishing gradients when trying to capture long-term dependencies.
We propose a relevancy screening mechanism that allows for a scalable use of sparse self-attention with recurrence.
arXiv Detail & Related papers (2020-06-16T19:24:25Z) - Understanding the Role of Training Regimes in Continual Learning [51.32945003239048]
Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially.
We study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks' local minima.
arXiv Detail & Related papers (2020-06-12T06:00:27Z) - A survey on modern trainable activation functions [0.0]
We propose a taxonomy of trainable activation functions and highlight common and distinctive proprieties of recent and past models.
We show that many of the proposed approaches are equivalent to adding neuron layers which use fixed (non-trainable) activation functions.
arXiv Detail & Related papers (2020-05-02T12:38:43Z) - Towards Efficient Processing and Learning with Spikes: New Approaches
for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks.
In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented.
Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z) - Evolutionary Optimization of Deep Learning Activation Functions [15.628118691027328]
We show that evolutionary algorithms can discover novel activation functions that outperform the Rectified Linear Unit (ReLU)
replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy.
These novel activation functions are shown to generalize, achieving high performance across tasks.
arXiv Detail & Related papers (2020-02-17T19:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.