Related papers: On the Occurence of Critical Learning Periods in Neural Networks

On the Occurence of Critical Learning Periods in Neural Networks

URL: http://arxiv.org/abs/2510.09687v1
Date: Thu, 09 Oct 2025 07:34:06 GMT
Title: On the Occurence of Critical Learning Periods in Neural Networks
Authors: Stanisław Pawlak,
Abstract summary: We study the plasticity of neural networks, offering empirical support for the notion that critical learning periods and warm-starting performance loss can be avoided.<n>We show that these problems can be averted by employing a cyclic learning rate schedule.<n>Our findings establish a vital link between critical learning periods and ongoing research on warm-starting neural network training.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study delves into the plasticity of neural networks, offering empirical support for the notion that critical learning periods and warm-starting performance loss can be avoided through simple adjustments to learning hyperparameters. The critical learning phenomenon emerges when training is initiated with deficit data. Subsequently, after numerous deficit epochs, the network's plasticity wanes, impeding its capacity to achieve parity in accuracy with models trained from scratch, even when extensive clean data training follows deficit epochs. Building upon seminal research introducing critical learning periods, we replicate key findings and broaden the experimental scope of the main experiment from the original work. In addition, we consider a warm-starting approach and show that it can be seen as a form of deficit pretraining. In particular, we demonstrate that these problems can be averted by employing a cyclic learning rate schedule. Our findings not only impact neural network training practices but also establish a vital link between critical learning periods and ongoing research on warm-starting neural network training.

Related papers

DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of Plasticity [11.624569521079426]
We develop a framework emulating real-world neural network training and identify noise memorization as the primary cause of plasticity loss when warm-starting on stationary data. Motivated by this, we propose Direction-Aware SHrinking (DASH), a method aiming to mitigate plasticity loss by selectively forgetting noise while preserving learned features.
arXiv Detail & Related papers (2024-10-30T22:57:54Z)
Early Period of Training Impacts Adaptation for Out-of-Distribution Generalization: An Empirical Study [56.283944756315066]
We investigate the relationship between learning dynamics, out-of-distribution generalization and the early period of neural network training.<n>We show that changing the number of trainable parameters during the early period of training can significantly improve OOD results.<n>Our experiments on both image and text data show that the early period of training is a general phenomenon that can improve ID and OOD performance with minimal complexity.
arXiv Detail & Related papers (2024-03-22T13:52:53Z)
Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective [91.14291142262262]
This work presents a straightforward and fundamental explanation from the data perspective. Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data. Our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.
arXiv Detail & Related papers (2023-10-16T09:35:42Z)
How connectivity structure shapes rich and lazy learning in neural circuits [14.236853424595333]
We investigate how the structure of the initial weights -- in particular their effective rank -- influences the network learning regime. Our research highlights the pivotal role of initial weight structures in shaping learning regimes.
arXiv Detail & Related papers (2023-10-12T17:08:45Z)
Critical Learning Periods Emerge Even in Deep Linear Networks [102.89011295243334]
Critical learning periods are periods early in development where temporary sensory deficits can have a permanent effect on behavior and learned representations. Despite the radical differences between biological and artificial networks, critical learning periods have been empirically observed in both systems.
arXiv Detail & Related papers (2023-08-23T16:01:50Z)
Critical Learning Periods for Multisensory Integration in Deep Networks [112.40005682521638]
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training. We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations.
arXiv Detail & Related papers (2022-10-06T23:50:38Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
Understanding the Role of Training Regimes in Continual Learning [51.32945003239048]
Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially. We study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks' local minima.
arXiv Detail & Related papers (2020-06-12T06:00:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.