Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability
- URL: http://arxiv.org/abs/2507.12837v1
- Date: Thu, 17 Jul 2025 06:52:56 GMT
- Title: Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability
- Authors: Kaiqi Jiang, Jeremy Cohen, Yuanzhi Li,
- Abstract summary: This paper examines the dynamics of NTK eigenvectors during Edge of Stability (EoS) in detail.<n>We observe that larger learning rates cause the leading eigenvectors of the final NTK, as well as the full NTK matrix, to have greater alignment with the training target.<n>We then study the underlying mechanism of this phenomenon and provide a theoretical analysis for a two-layer linear network.
- Score: 35.683596966795136
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The study of Neural Tangent Kernels (NTKs) in deep learning has drawn increasing attention in recent years. NTKs typically actively change during training and are related to feature learning. In parallel, recent work on Gradient Descent (GD) has found a phenomenon called Edge of Stability (EoS), in which the largest eigenvalue of the NTK oscillates around a value inversely proportional to the step size. However, although follow-up works have explored the underlying mechanism of such eigenvalue behavior in depth, the understanding of the behavior of the NTK eigenvectors during EoS is still missing. This paper examines the dynamics of NTK eigenvectors during EoS in detail. Across different architectures, we observe that larger learning rates cause the leading eigenvectors of the final NTK, as well as the full NTK matrix, to have greater alignment with the training target. We then study the underlying mechanism of this phenomenon and provide a theoretical analysis for a two-layer linear network. Our study enhances the understanding of GD training dynamics in deep learning.
Related papers
- Reactivation: Empirical NTK Dynamics Under Task Shifts [12.32540447018987]
The Neural Tangent Kernel (NTK) offers a powerful tool to study the functional dynamics of neural networks.<n>In the so-called lazy, or kernel regime, the NTK remains static during training and the network function is linear in the static neural tangents feature space.<n>The study of the NTK dynamics has led to several critical discoveries in recent years, in generalization and scaling behaviours.
arXiv Detail & Related papers (2025-07-21T20:13:02Z) - Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs [0.0]
We aim to advance the theoretical understanding of cPIKANs by analyzing them using Neural Tangent Kernel (NTK) theory.<n>We first derive the NTK of standard cKANs in a supervised setting, and then extend the analysis to the physics-informed context.<n>Results indicate a tractable behavior for NTK in the context of cPIKANs, which exposes learning dynamics that standard physics-informed neural networks (PINNs) cannot capture.
arXiv Detail & Related papers (2025-06-09T17:30:13Z) - Connecting NTK and NNGP: A Unified Theoretical Framework for Wide Neural Network Learning Dynamics [6.349503549199403]
We provide a comprehensive framework for the learning process of deep wide neural networks.<n>By characterizing the diffusive phase, our work sheds light on representational drift in the brain.
arXiv Detail & Related papers (2023-09-08T18:00:01Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness? [0.0]
We study adversarial examples of trained neural networks through analytical tools afforded by recent theory advances connecting neural networks and kernel methods.
We show how NTKs allow to generate adversarial examples in a training-free'' fashion, and demonstrate that they transfer to fool their finite-width neural net counterparts in the lazy'' regime.
arXiv Detail & Related papers (2022-10-11T16:11:48Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Weighted Neural Tangent Kernel: A Generalized and Improved
Network-Induced Kernel [20.84988773171639]
The Neural Tangent Kernel (NTK) has recently attracted intense study, as it describes the evolution of an over- parameterized Neural Network (NN) trained by gradient descent.
We introduce the Weighted Neural Tangent Kernel (WNTK), a generalized and improved tool, which can capture an over- parameterized NN's training dynamics under different gradients.
With the proposed weight update algorithm, both empirical and analytical WNTKs outperform the corresponding NTKs in numerical experiments.
arXiv Detail & Related papers (2021-03-22T03:16:20Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.