Connecting NTK and NNGP: A Unified Theoretical Framework for Neural
Network Learning Dynamics in the Kernel Regime
- URL: http://arxiv.org/abs/2309.04522v1
- Date: Fri, 8 Sep 2023 18:00:01 GMT
- Title: Connecting NTK and NNGP: A Unified Theoretical Framework for Neural
Network Learning Dynamics in the Kernel Regime
- Authors: Yehonatan Avidan, Qianyi Li, Haim Sompolinsky
- Abstract summary: We provide a comprehensive framework for understanding the learning process of deep neural networks in the infinite width limit.
We identify two learning phases characterized by different time scales: gradient-driven and diffusive learning.
- Score: 7.136205674624813
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Artificial neural networks have revolutionized machine learning in recent
years, but a complete theoretical framework for their learning process is still
lacking. Substantial progress has been made for infinitely wide networks. In
this regime, two disparate theoretical frameworks have been used, in which the
network's output is described using kernels: one framework is based on the
Neural Tangent Kernel (NTK) which assumes linearized gradient descent dynamics,
while the Neural Network Gaussian Process (NNGP) kernel assumes a Bayesian
framework. However, the relation between these two frameworks has remained
elusive. This work unifies these two distinct theories using a Markov proximal
learning model for learning dynamics in an ensemble of randomly initialized
infinitely wide deep networks. We derive an exact analytical expression for the
network input-output function during and after learning, and introduce a new
time-dependent Neural Dynamical Kernel (NDK) from which both NTK and NNGP
kernels can be derived. We identify two learning phases characterized by
different time scales: gradient-driven and diffusive learning. In the initial
gradient-driven learning phase, the dynamics is dominated by deterministic
gradient descent, and is described by the NTK theory. This phase is followed by
the diffusive learning stage, during which the network parameters sample the
solution space, ultimately approaching the equilibrium distribution
corresponding to NNGP. Combined with numerical evaluations on synthetic and
benchmark datasets, we provide novel insights into the different roles of
initialization, regularization, and network depth, as well as phenomena such as
early stopping and representational drift. This work closes the gap between the
NTK and NNGP theories, providing a comprehensive framework for understanding
the learning process of deep neural networks in the infinite width limit.
Related papers
- Stochastic Gradient Descent for Two-layer Neural Networks [2.0349026069285423]
This paper presents a study on the convergence rates of the descent (SGD) algorithm when applied to overparameterized two-layer neural networks.
Our approach combines the Tangent Kernel (NTK) approximation with convergence analysis in the Reproducing Kernel Space (RKHS) generated by NTK.
Our research framework enables us to explore the intricate interplay between kernel methods and optimization processes, shedding light on the dynamics and convergence properties of neural networks.
arXiv Detail & Related papers (2024-07-10T13:58:57Z) - Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - A Unified Kernel for Neural Network Learning [4.0759204898334715]
We present the Unified Neural Kernel (UNK), which characterizes the learning dynamics of neural networks with gradient descents.
UNK maintains the limiting properties of both NNGP and NTK, exhibiting behaviors akin to NTK with a finite learning step.
We also theoretically characterize the uniform tightness and learning convergence of the UNK kernel.
arXiv Detail & Related papers (2024-03-26T07:55:45Z) - Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network.
We provide analytical expressions for these speed limits for linear and linearizable neural networks.
Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Neural Piecewise-Constant Delay Differential Equations [17.55759866368141]
In this article, we introduce a new sort of continuous-depth neural network, called the Neural Piecewise-Constant Delay Differential Equations (PCDDEs)
We show that the Neural PCDDEs do outperform the several existing continuous-depth neural frameworks on the one-dimensional piecewise-constant delay population dynamics and real-world datasets.
arXiv Detail & Related papers (2022-01-04T03:44:15Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Geometry Perspective Of Estimating Learning Capability Of Neural
Networks [0.0]
The paper considers a broad class of neural networks with generalized architecture performing simple least square regression with gradient descent (SGD)
The relationship between the generalization capability with the stability of the neural network has also been discussed.
By correlating the principles of high-energy physics with the learning theory of neural networks, the paper establishes a variant of the Complexity-Action conjecture from an artificial neural network perspective.
arXiv Detail & Related papers (2020-11-03T12:03:19Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.