Recovering Plasticity of Neural Networks via Soft Weight Rescaling
- URL: http://arxiv.org/abs/2507.04683v1
- Date: Mon, 07 Jul 2025 06:02:55 GMT
- Title: Recovering Plasticity of Neural Networks via Soft Weight Rescaling
- Authors: Seungwon Oh, Sangyeon Park, Isaac Han, Kyung-Joong Kim,
- Abstract summary: An unbounded weight growth is one of the main causes of plasticity loss.<n>We propose Soft Weight Rescaling (SWR), a novel approach that prevents unbounded weight growth without losing information.<n>SWR recovers the plasticity of the network by simply scaling down the weight at each step of the learning process.
- Score: 3.841822016067955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies have shown that as training progresses, neural networks gradually lose their capacity to learn new information, a phenomenon known as plasticity loss. An unbounded weight growth is one of the main causes of plasticity loss. Furthermore, it harms generalization capability and disrupts optimization dynamics. Re-initializing the network can be a solution, but it results in the loss of learned information, leading to performance drops. In this paper, we propose Soft Weight Rescaling (SWR), a novel approach that prevents unbounded weight growth without losing information. SWR recovers the plasticity of the network by simply scaling down the weight at each step of the learning process. We theoretically prove that SWR bounds weight magnitude and balances weight magnitude between layers. Our experiment shows that SWR improves performance on warm-start learning, continual learning, and single-task learning setups on standard image classification benchmarks.
Related papers
- Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning [57.3885832382455]
We show that introducing static network sparsity alone can unlock further scaling potential beyond dense counterparts with state-of-the-art architectures.<n>Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity.
arXiv Detail & Related papers (2025-06-20T17:54:24Z) - DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of Plasticity [11.624569521079426]
We develop a framework emulating real-world neural network training and identify noise memorization as the primary cause of plasticity loss when warm-starting on stationary data.
Motivated by this, we propose Direction-Aware SHrinking (DASH), a method aiming to mitigate plasticity loss by selectively forgetting noise while preserving learned features.
arXiv Detail & Related papers (2024-10-30T22:57:54Z) - CLASSP: a Biologically-Inspired Approach to Continual Learning through Adjustment Suppression and Sparsity Promotion [0.0]
This paper introduces a new training method named Continual Learning through Adjustment Suppression and Sparsity Promotion (CLASSP)
CLASSP is based on two main principles observed in neuroscience, particularly in the context of synaptic transmission and Long-Term Potentiation.
When compared with Elastic Weight Consolidation (EWC) datasets, CLASSP demonstrates superior performance in terms of accuracy and memory footprint.
arXiv Detail & Related papers (2024-04-29T13:31:00Z) - Disentangling the Causes of Plasticity Loss in Neural Networks [55.23250269007988]
We show that loss of plasticity can be decomposed into multiple independent mechanisms.
We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks.
arXiv Detail & Related papers (2024-02-29T00:02:33Z) - Maintaining Plasticity in Deep Continual Learning [12.27972591521307]
We provide demonstrations of loss of plasticity using datasets repurposed for continual learning as sequences of tasks.
In ImageNet, binary classification performance dropped from 89% accuracy on an early task down to 77%.
New algorithm -- continual backpropagation -- modifies conventional backpropagation to re-use less-used units after each example.
arXiv Detail & Related papers (2023-06-23T23:19:21Z) - Random Weights Networks Work as Loss Prior Constraint for Image
Restoration [50.80507007507757]
We present our belief Random Weights Networks can be Acted as Loss Prior Constraint for Image Restoration''
Our belief can be directly inserted into existing networks without any training and testing computational cost.
To emphasize, our main focus is to spark the realms of loss function and save their current neglected status.
arXiv Detail & Related papers (2023-03-29T03:43:51Z) - Improving Deep Neural Network Random Initialization Through Neuronal
Rewiring [14.484787903053208]
We show that a higher neuronal strength variance may decrease performance, while a lower neuronal strength variance usually improves it.
A new method is then proposed to rewire neuronal connections according to a preferential attachment (PA) rule based on their strength.
In this sense, PA only reorganizes connections, while preserving the magnitude and distribution of the weights.
arXiv Detail & Related papers (2022-07-17T11:52:52Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Flattening Sharpness for Dynamic Gradient Projection Memory Benefits
Continual Learning [67.99349091593324]
We investigate the relationship between the weight loss landscape and sensitivity-stability in the continual learning scenario.
Our proposed method consistently outperforms baselines with the superior ability to learn new skills while alleviating forgetting effectively.
arXiv Detail & Related papers (2021-10-09T15:13:44Z) - Neural networks with late-phase weights [66.72777753269658]
We show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning.
At the end of learning, we obtain back a single model by taking a spatial average in weight space.
arXiv Detail & Related papers (2020-07-25T13:23:37Z) - The Golden Ratio of Learning and Momentum [0.5076419064097732]
This paper proposes a new information-theoretical loss function motivated by neural signal processing in a synapse.
All results taken together show that loss, learning rate, and momentum are closely connected.
arXiv Detail & Related papers (2020-06-08T17:08:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.