A Fast, Well-Founded Approximation to the Empirical Neural Tangent
Kernel
- URL: http://arxiv.org/abs/2206.12543v3
- Date: Wed, 7 Jun 2023 15:07:14 GMT
- Title: A Fast, Well-Founded Approximation to the Empirical Neural Tangent
Kernel
- Authors: Mohamad Amin Mohamadi, Wonho Bae, Danica J. Sutherland
- Abstract summary: Empirical neural kernels (eNTKs) can provide a good understanding of a given network's representation.
For networks with O output units, the eNTK on N inputs is of size $NO times NO$, taking $O((NO)2)$ memory and up to $O((NO)3)$.
Most existing applications have used one of a handful of approximations yielding $N times N$ kernel matrices.
We prove that one such approximation, which we call "sum of logits", converges to the true eNT
- Score: 6.372625755672473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Empirical neural tangent kernels (eNTKs) can provide a good understanding of
a given network's representation: they are often far less expensive to compute
and applicable more broadly than infinite width NTKs. For networks with O
output units (e.g. an O-class classifier), however, the eNTK on N inputs is of
size $NO \times NO$, taking $O((NO)^2)$ memory and up to $O((NO)^3)$
computation. Most existing applications have therefore used one of a handful of
approximations yielding $N \times N$ kernel matrices, saving orders of
magnitude of computation, but with limited to no justification. We prove that
one such approximation, which we call "sum of logits", converges to the true
eNTK at initialization for any network with a wide final "readout" layer. Our
experiments demonstrate the quality of this approximation for various uses
across a range of settings.
Related papers
- Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Multi-layer random features and the approximation power of neural networks [4.178980693837599]
We prove that a reproducing kernel Hilbert space contains only functions that can be approximated by the architecture.
We show that if eigenvalues of the integral operator of the NNGP decay slower than $k-n-frac23$ where $k$ is an order of an eigenvalue, our theorem guarantees a more succinct neural network approximation than Barron's theorem.
arXiv Detail & Related papers (2024-04-26T14:57:56Z) - Generalization Ability of Wide Neural Networks on $\mathbb{R}$ [8.508360765158326]
We study the generalization ability of the wide two-layer ReLU neural network on $mathbbR$.
We show that: $i)$ when the width $mrightarrowinfty$, the neural network kernel (NNK) uniformly converges to the NTK; $ii)$ the minimax rate of regression over the RKHS associated to $K_1$ is $n-2/3$; $iii)$ if one adopts the early stopping strategy in training a wide neural network, the resulting neural network achieves the minimax rate; $iv
arXiv Detail & Related papers (2023-02-12T15:07:27Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - Neural Networks can Learn Representations with Gradient Descent [68.95262816363288]
In specific regimes, neural networks trained by gradient descent behave like kernel methods.
In practice, it is known that neural networks strongly outperform their associated kernels.
arXiv Detail & Related papers (2022-06-30T09:24:02Z) - Fast Finite Width Neural Tangent Kernel [47.57136433797996]
The neural network Jacobian has emerged as a central object of study in deep learning.
The finite width NTK is notoriously expensive to compute.
We propose two novel algorithms that change the exponent of the compute and memory requirements of the finite width NTK.
arXiv Detail & Related papers (2022-06-17T12:18:22Z) - Memorization and Optimization in Deep Neural Networks with Minimum
Over-parameterization [14.186776881154127]
The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks.
We show that the NTK is well conditioned in a challenging sub-linear setup.
Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks.
arXiv Detail & Related papers (2022-05-20T14:50:24Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Interval Universal Approximation for Neural Networks [47.767793120249095]
We introduce the interval universal approximation (IUA) theorem.
IUA shows that neural networks can approximate any continuous function $f$ as we have known for decades.
We study the computational complexity of constructing neural networks that are amenable to precise interval analysis.
arXiv Detail & Related papers (2020-07-12T20:43:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.