Related papers: A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel

A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel

URL: http://arxiv.org/abs/2206.12543v3
Date: Wed, 7 Jun 2023 15:07:14 GMT
Title: A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel
Authors: Mohamad Amin Mohamadi, Wonho Bae, Danica J. Sutherland
Abstract summary: Empirical neural kernels (eNTKs) can provide a good understanding of a given network's representation. For networks with O output units, the eNTK on N inputs is of size $NO times NO$, taking $O((NO)2)$ memory and up to $O((NO)3)$. Most existing applications have used one of a handful of approximations yielding $N times N$ kernel matrices. We prove that one such approximation, which we call "sum of logits", converges to the true eNT
Score: 6.372625755672473
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Empirical neural tangent kernels (eNTKs) can provide a good understanding of a given network's representation: they are often far less expensive to compute and applicable more broadly than infinite width NTKs. For networks with O output units (e.g. an O-class classifier), however, the eNTK on N inputs is of size $NO \times NO$, taking $O((NO)^2)$ memory and up to $O((NO)^3)$ computation. Most existing applications have therefore used one of a handful of approximations yielding $N \times N$ kernel matrices, saving orders of magnitude of computation, but with limited to no justification. We prove that one such approximation, which we call "sum of logits", converges to the true eNTK at initialization for any network with a wide final "readout" layer. Our experiments demonstrate the quality of this approximation for various uses across a range of settings.

Related papers

Approximation of the Proximal Operator of the $\ell_\infty$ Norm Using a Neural Network [1.7265013728931]
We present an approximation of $textbfprox_alphacdot||infty(mathbfx)$ using a neural network. A novel aspect of the network is that it is able to accept vectors of varying lengths due to a feature selection process. We show that the network outperforms a "vanilla neural network" that does not use feature selection.
arXiv Detail & Related papers (2024-08-20T22:12:30Z)
Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable. We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z)
Generalization Ability of Wide Neural Networks on $\mathbb{R}$ [8.508360765158326]
We study the generalization ability of the wide two-layer ReLU neural network on $mathbbR$. We show that: $i)$ when the width $mrightarrowinfty$, the neural network kernel (NNK) uniformly converges to the NTK; $ii)$ the minimax rate of regression over the RKHS associated to $K_1$ is $n-2/3$; $iii)$ if one adopts the early stopping strategy in training a wide neural network, the resulting neural network achieves the minimax rate; $iv
arXiv Detail & Related papers (2023-02-12T15:07:27Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$. We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z)
Neural Networks can Learn Representations with Gradient Descent [68.95262816363288]
In specific regimes, neural networks trained by gradient descent behave like kernel methods. In practice, it is known that neural networks strongly outperform their associated kernels.
arXiv Detail & Related papers (2022-06-30T09:24:02Z)
Fast Finite Width Neural Tangent Kernel [47.57136433797996]
The neural network Jacobian has emerged as a central object of study in deep learning. The finite width NTK is notoriously expensive to compute. We propose two novel algorithms that change the exponent of the compute and memory requirements of the finite width NTK.
arXiv Detail & Related papers (2022-06-17T12:18:22Z)
Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization [14.186776881154127]
The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. We show that the NTK is well conditioned in a challenging sub-linear setup. Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks.
arXiv Detail & Related papers (2022-05-20T14:50:24Z)
Interval Universal Approximation for Neural Networks [47.767793120249095]
We introduce the interval universal approximation (IUA) theorem. IUA shows that neural networks can approximate any continuous function $f$ as we have known for decades. We study the computational complexity of constructing neural networks that are amenable to precise interval analysis.
arXiv Detail & Related papers (2020-07-12T20:43:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.