Related papers: Disentangling the Causes of Plasticity Loss in Neural Networks

Disentangling the Causes of Plasticity Loss in Neural Networks

URL: http://arxiv.org/abs/2402.18762v1
Date: Thu, 29 Feb 2024 00:02:33 GMT
Title: Disentangling the Causes of Plasticity Loss in Neural Networks
Authors: Clare Lyle, Zeyu Zheng, Khimya Khetarpal, Hado van Hasselt, Razvan Pascanu, James Martens, Will Dabney
Abstract summary: We show that loss of plasticity can be decomposed into multiple independent mechanisms. We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks.
Score: 55.23250269007988
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Underpinning the past decades of work on the design, initialization, and optimization of neural networks is a seemingly innocuous assumption: that the network is trained on a \textit{stationary} data distribution. In settings where this assumption is violated, e.g.\ deep reinforcement learning, learning algorithms become unstable and brittle with respect to hyperparameters and even random seeds. One factor driving this instability is the loss of plasticity, meaning that updating the network's predictions in response to new information becomes more difficult as training progresses. While many recent works provide analyses and partial solutions to this phenomenon, a fundamental question remains unanswered: to what extent do known mechanisms of plasticity loss overlap, and how can mitigation strategies be combined to best maintain the trainability of a network? This paper addresses these questions, showing that loss of plasticity can be decomposed into multiple independent mechanisms and that, while intervening on any single mechanism is insufficient to avoid the loss of plasticity in all cases, intervening on multiple mechanisms in conjunction results in highly robust learning algorithms. We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks, and further demonstrate its effectiveness on naturally arising nonstationarities, including reinforcement learning in the Arcade Learning Environment.

Related papers

Plasticity Loss in Deep Reinforcement Learning: A Survey [15.525552360867367]
plasticity is crucial for deep Reinforcement Learning (RL) agents. Once plasticity is lost, an agent's performance will plateau because it cannot improve its policy to account for changes in the data distribution. Loss of plasticity can be connected to many other issues plaguing deep RL, such as training instabilities, scaling failures, overestimation bias, and insufficient exploration.
arXiv Detail & Related papers (2024-11-07T16:13:54Z)
Neural Network Plasticity and Loss Sharpness [0.0]
Recent findings indicate that plasticity loss on new tasks is highly related to loss landscape sharpness in non-stationary RL frameworks. We explore the usage of sharpness regularization techniques, which seek out smooth minima and have been touted for their generalization capabilities in vanilla prediction settings.
arXiv Detail & Related papers (2024-09-25T19:20:09Z)
TDNetGen: Empowering Complex Network Resilience Prediction with Generative Augmentation of Topology and Dynamics [14.25304439234864]
We introduce a novel resilience prediction framework for complex networks, designed to tackle this issue through generative data augmentation of network topology and dynamics. Experiment results on three network datasets demonstrate that our proposed framework TDNetGen can achieve high prediction accuracy up to 85%-95%.
arXiv Detail & Related papers (2024-08-19T09:20:31Z)
Oja's plasticity rule overcomes several challenges of training neural networks under biological constraints [0.0]
We show that incorporating Oja's plasticity rule into error-driven training yields stable, efficient learning in feedforward and recurrent architectures. Our results show that Oja's rule preserves richer activation subspaces, mitigates exploding or vanishing signals, and improves short-term memory in recurrent networks.
arXiv Detail & Related papers (2024-08-15T20:26:47Z)
A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning [7.767611890997147]
We show that plasticity loss is pervasive under domain shift in on-policy deep RL. We find that a class of regenerative'' methods are able to consistently mitigate plasticity loss in a variety of contexts.
arXiv Detail & Related papers (2024-05-29T14:59:49Z)
Semantic Loss Functions for Neuro-Symbolic Structured Prediction [74.18322585177832]
We discuss the semantic loss, which injects knowledge about such structure, defined symbolically, into training. It is agnostic to the arrangement of the symbols, and depends only on the semantics expressed thereby. It can be combined with both discriminative and generative neural models.
arXiv Detail & Related papers (2024-05-12T22:18:25Z)
Understanding plasticity in neural networks [41.79540750236036]
Plasticity is the ability of a neural network to quickly change its predictions in response to new information. Deep neural networks are known to lose plasticity over the course of training even in relatively simple learning problems.
arXiv Detail & Related papers (2023-03-02T18:47:51Z)
Critical Learning Periods for Multisensory Integration in Deep Networks [112.40005682521638]
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training. We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations.
arXiv Detail & Related papers (2022-10-06T23:50:38Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
Vulnerability Under Adversarial Machine Learning: Bias or Variance? [77.30759061082085]
We investigate the effect of adversarial machine learning on the bias and variance of a trained deep neural network. Our analysis sheds light on why the deep neural networks have poor performance under adversarial perturbation. We introduce a new adversarial machine learning algorithm with lower computational complexity than well-known adversarial machine learning strategies.
arXiv Detail & Related papers (2020-08-01T00:58:54Z)
Understanding the Role of Training Regimes in Continual Learning [51.32945003239048]
Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially. We study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks' local minima.
arXiv Detail & Related papers (2020-06-12T06:00:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.