Related papers: New Evidence of the Two-Phase Learning Dynamics of Neural Networks

New Evidence of the Two-Phase Learning Dynamics of Neural Networks

URL: http://arxiv.org/abs/2505.13900v1
Date: Tue, 20 May 2025 04:03:52 GMT
Title: New Evidence of the Two-Phase Learning Dynamics of Neural Networks
Authors: Zhanpeng Zhou, Yongyi Yang, Mahito Sugiyama, Junchi Yan,
Abstract summary: We introduce an interval-wise perspective that compares network states across a time window.<n>We show that the response of the network to a perturbation exhibits a transition from chaotic to stable.<n>We also find that after this transition point the model's functional trajectory is confined to a narrow cone-shaped subset.
Score: 59.55028392232715
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding how deep neural networks learn remains a fundamental challenge in modern machine learning. A growing body of evidence suggests that training dynamics undergo a distinct phase transition, yet our understanding of this transition is still incomplete. In this paper, we introduce an interval-wise perspective that compares network states across a time window, revealing two new phenomena that illuminate the two-phase nature of deep learning. i) \textbf{The Chaos Effect.} By injecting an imperceptibly small parameter perturbation at various stages, we show that the response of the network to the perturbation exhibits a transition from chaotic to stable, suggesting there is an early critical period where the network is highly sensitive to initial conditions; ii) \textbf{The Cone Effect.} Tracking the evolution of the empirical Neural Tangent Kernel (eNTK), we find that after this transition point the model's functional trajectory is confined to a narrow cone-shaped subset: while the kernel continues to change, it gets trapped into a tight angular region. Together, these effects provide a structural, dynamical view of how deep networks transition from sensitive exploration to stable refinement during training.

Related papers

The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions [51.68215326304272]
We show that even small perturbations reliably cause otherwise identical training trajectories to diverge-an effect that diminishes rapidly over training time.<n>Our findings provide insights into neural network training stability, with practical implications for fine-tuning, model merging, and diversity of model ensembles.
arXiv Detail & Related papers (2025-06-16T08:35:16Z)
On the Cone Effect in the Learning Dynamics [57.02319387815831]
We take an empirical perspective to study the learning dynamics of neural networks in real-world settings.<n>Our key findings reveal a two-phase learning process: i) in Phase I, the eNTK evolves significantly, signaling the rich regime, and ii) in Phase II, the eNTK keeps evolving but is constrained in a narrow space.
arXiv Detail & Related papers (2025-03-20T16:38:25Z)
A ghost mechanism: An analytical model of abrupt learning [6.509233267425589]
We show how even a one-dimensional system can exhibit abrupt learning through ghost points rather than bifurcations.<n>Our model reveals a bifurcation-free mechanism for abrupt learning and illustrates the importance of both deliberate uncertainty and redundancy in stabilizing learning dynamics.
arXiv Detail & Related papers (2025-01-04T20:49:20Z)
Connecting NTK and NNGP: A Unified Theoretical Framework for Wide Neural Network Learning Dynamics [6.349503549199403]
We provide a comprehensive framework for the learning process of deep wide neural networks.<n>By characterizing the diffusive phase, our work sheds light on representational drift in the brain.
arXiv Detail & Related papers (2023-09-08T18:00:01Z)
Phase Diagram of Initial Condensation for Two-layer Neural Networks [4.404198015660192]
We present a phase diagram of initial condensation for two-layer neural networks. Our phase diagram serves to provide a comprehensive understanding of the dynamical regimes of neural networks.
arXiv Detail & Related papers (2023-03-12T03:55:38Z)
Stochastic Gradient Descent-Induced Drift of Representation in a Two-Layer Neural Network [0.0]
Despite being observed in the brain and in artificial networks, the mechanisms of drift and its implications are not fully understood. Motivated by recent experimental findings of stimulus-dependent drift in the piriform cortex, we use theory and simulations to study this phenomenon in a two-layer linear feedforward network.
arXiv Detail & Related papers (2023-02-06T04:56:05Z)
A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers. Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module. Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z)
Critical Learning Periods for Multisensory Integration in Deep Networks [112.40005682521638]
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training. We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations.
arXiv Detail & Related papers (2022-10-06T23:50:38Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error. We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.