Limitations of the NTK for Understanding Generalization in Deep Learning
- URL: http://arxiv.org/abs/2206.10012v1
- Date: Mon, 20 Jun 2022 21:23:28 GMT
- Title: Limitations of the NTK for Understanding Generalization in Deep Learning
- Authors: Nikhil Vyas, Yamini Bansal, Preetum Nakkiran
- Abstract summary: We study NTKs through the lens of scaling laws, and demonstrate that they fall short of explaining important aspects of neural network generalization.
We show that even if the empirical NTK is allowed to be pre-trained on a constant number of samples, the kernel scaling does not catch up to the neural network scaling.
- Score: 13.44676002603497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ``Neural Tangent Kernel'' (NTK) (Jacot et al 2018), and its empirical
variants have been proposed as a proxy to capture certain behaviors of real
neural networks. In this work, we study NTKs through the lens of scaling laws,
and demonstrate that they fall short of explaining important aspects of neural
network generalization. In particular, we demonstrate realistic settings where
finite-width neural networks have significantly better data scaling exponents
as compared to their corresponding empirical and infinite NTKs at
initialization. This reveals a more fundamental difference between the real
networks and NTKs, beyond just a few percentage points of test accuracy.
Further, we show that even if the empirical NTK is allowed to be pre-trained on
a constant number of samples, the kernel scaling does not catch up to the
neural network scaling. Finally, we show that the empirical NTK continues to
evolve throughout most of the training, in contrast with prior work which
suggests that it stabilizes after a few epochs of training. Altogether, our
work establishes concrete limitations of the NTK approach in understanding
generalization of real networks on natural datasets.
Related papers
- Analyzing the Neural Tangent Kernel of Periodically Activated Coordinate
Networks [30.92757082348805]
We provide a theoretical understanding of periodically activated networks through an analysis of their Neural Tangent Kernel (NTK)
Our findings indicate that periodically activated networks are textitnotably more well-behaved, from the NTK perspective, than ReLU activated networks.
arXiv Detail & Related papers (2024-02-07T12:06:52Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Neural Tangent Kernel Analysis of Deep Narrow Neural Networks [11.623483126242478]
We present the first trainability guarantee of infinitely deep but narrow neural networks.
We then extend the analysis to an infinitely deep convolutional neural network (CNN) and perform brief experiments.
arXiv Detail & Related papers (2022-02-07T07:27:02Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel
Theory? [2.0711789781518752]
Neural Kernel (NTK) theory is widely used to study the dynamics of infinitely-wide deep neural networks (DNNs) under gradient descent.
We study empirically when NTK theory is valid in practice for fully-connected ReLU and sigmoid DNNs.
In particular, NTK theory does not explain the behavior of sufficiently deep networks so that their gradients explode as they propagate through the network's layers.
arXiv Detail & Related papers (2020-12-08T15:19:45Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.