Understanding the role of depth in the neural tangent kernel for overparameterized neural networks
- URL: http://arxiv.org/abs/2511.07272v1
- Date: Mon, 10 Nov 2025 16:18:04 GMT
- Title: Understanding the role of depth in the neural tangent kernel for overparameterized neural networks
- Authors: William St-Arnaud, Margarida Carvalho, Golnoosh Farnadi,
- Abstract summary: We study the sensitivity of large Re parameterLU networks to increasing depths by characterizing the corresponding limiting kernel.<n>We empirically evaluate the order of magnitude in network depth required to observe this convergent behavior.
- Score: 14.953877391067893
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Overparameterized fully-connected neural networks have been shown to behave like kernel models when trained with gradient descent, under mild conditions on the width, the learning rate, and the parameter initialization. In the limit of infinitely large widths and small learning rate, the kernel that is obtained allows to represent the output of the learned model with a closed-form solution. This closed-form solution hinges on the invertibility of the limiting kernel, a property that often holds on real-world datasets. In this work, we analyze the sensitivity of large ReLU networks to increasing depths by characterizing the corresponding limiting kernel. Our theoretical results demonstrate that the normalized limiting kernel approaches the matrix of ones. In contrast, they show the corresponding closed-form solution approaches a fixed limit on the sphere. We empirically evaluate the order of magnitude in network depth required to observe this convergent behavior, and we describe the essential properties that enable the generalization of our results to other kernels.
Related papers
- Adaptive kernel predictors from feature-learning infinite limits of neural networks [46.50233782149623]
We show that neural networks trained in the rich, feature learning infinite-width regime are also described by kernel machines.<n>We provide explicit expressions for the kernel predictors and prescriptions to numerically calculate them.
arXiv Detail & Related papers (2025-02-11T22:34:49Z) - Wide Neural Networks as Gaussian Processes: Lessons from Deep
Equilibrium Models [16.07760622196666]
We study the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers.
Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process.
Remarkably, this convergence holds even when the limits of depth and width are interchanged.
arXiv Detail & Related papers (2023-10-16T19:00:43Z) - Sparsity-depth Tradeoff in Infinitely Wide Deep Neural Networks [22.083873334272027]
We observe that sparser networks outperform the non-sparse networks at shallow depths on a variety of datasets.
We extend the existing theory on the generalization error of kernel-ridge regression.
arXiv Detail & Related papers (2023-05-17T20:09:35Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Neural Tangent Kernel: A Survey [0.0]
A seminal work demonstrated that training a neural network under specific parameterization is equivalent to performing a particular kernel method as width goes to infinity.
This equivalence opened a promising direction for applying the results of the rich literature on kernel methods to neural nets which were much harder to tackle.
The present survey covers key results on kernel convergence as width goes to infinity, finite-width corrections, applications, and a discussion of the limitations of the corresponding method.
arXiv Detail & Related papers (2022-08-29T14:05:54Z) - Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide
Neural Networks [18.27510863075184]
We analyze feature learning in infinite width neural networks trained with gradient flow through a self-consistent dynamical field theory.
We construct a collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points.
arXiv Detail & Related papers (2022-05-19T16:10:10Z) - Neural Networks as Kernel Learners: The Silent Alignment Effect [86.44610122423994]
Neural networks in the lazy training regime converge to kernel machines.
We show that this can indeed happen due to a phenomenon we term silent alignment.
We also demonstrate that non-whitened data can weaken the silent alignment effect.
arXiv Detail & Related papers (2021-10-29T18:22:46Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z) - On Random Kernels of Residual Architectures [93.94469470368988]
We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets.
Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity.
In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed.
arXiv Detail & Related papers (2020-01-28T16:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.