Related papers: Fast Finite Width Neural Tangent Kernel

Fast Finite Width Neural Tangent Kernel

URL: http://arxiv.org/abs/2206.08720v1
Date: Fri, 17 Jun 2022 12:18:22 GMT
Title: Fast Finite Width Neural Tangent Kernel
Authors: Roman Novak, Jascha Sohl-Dickstein, Samuel S. Schoenholz
Abstract summary: The neural network Jacobian has emerged as a central object of study in deep learning. The finite width NTK is notoriously expensive to compute. We propose two novel algorithms that change the exponent of the compute and memory requirements of the finite width NTK.
Score: 47.57136433797996
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Neural Tangent Kernel (NTK), defined as $\Theta_\theta^f(x_1, x_2) = \left[\partial f(\theta, x_1)\big/\partial \theta\right] \left[\partial f(\theta, x_2)\big/\partial \theta\right]^T$ where $\left[\partial f(\theta, \cdot)\big/\partial \theta\right]$ is a neural network (NN) Jacobian, has emerged as a central object of study in deep learning. In the infinite width limit, the NTK can sometimes be computed analytically and is useful for understanding training and generalization of NN architectures. At finite widths, the NTK is also used to better initialize NNs, compare the conditioning across models, perform architecture search, and do meta-learning. Unfortunately, the finite width NTK is notoriously expensive to compute, which severely limits its practical utility. We perform the first in-depth analysis of the compute and memory requirements for NTK computation in finite width networks. Leveraging the structure of neural networks, we further propose two novel algorithms that change the exponent of the compute and memory requirements of the finite width NTK, dramatically improving efficiency. Our algorithms can be applied in a black box fashion to any differentiable function, including those implementing neural networks. We open-source our implementations within the Neural Tangents package (arXiv:1912.02803) at https://github.com/google/neural-tangents.

Related papers

Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods [0.0]
We consider functions as expectations of Sobolev functions over all possible one-dimensional projections of the data.<n>This framework is similar to kernel ridge regression, where the kernel is $mathbbE_w ( k(B)(wtop x,wtop xprime))$, with $k(B)(a,b) := min(|a|, |b|)mathds1_ab>0$ the Brownian kernel, and the distribution of the projections $w$ is learnt
arXiv Detail & Related papers (2024-07-24T13:46:50Z)
LinSATNet: The Positive Linear Satisfiability Neural Networks [116.65291739666303]
This paper studies how to introduce the popular positive linear satisfiability to neural networks. We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions.
arXiv Detail & Related papers (2024-07-18T22:05:21Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$. We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z)
Robust Training and Verification of Implicit Neural Networks: A Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks. We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network. We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z)
A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel [6.372625755672473]
Empirical neural kernels (eNTKs) can provide a good understanding of a given network's representation. For networks with O output units, the eNTK on N inputs is of size $NO times NO$, taking $O((NO)2)$ memory and up to $O((NO)3)$. Most existing applications have used one of a handful of approximations yielding $N times N$ kernel matrices. We prove that one such approximation, which we call "sum of logits", converges to the true eNT
arXiv Detail & Related papers (2022-06-25T03:02:35Z)
Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization [14.186776881154127]
The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. We show that the NTK is well conditioned in a challenging sub-linear setup. Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks.
arXiv Detail & Related papers (2022-05-20T14:50:24Z)
Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets. We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels. We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z)
Feature Learning in Infinite-Width Neural Networks [17.309380337367536]
We show that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn features. We propose simple modifications to the standard parametrization to allow for feature learning in the limit.
arXiv Detail & Related papers (2020-11-30T03:21:05Z)
Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks. Centered and ensembled finite networks have reduced posterior variance. Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z)
Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH) [2.50686294157537]
Gradient descent yields zero loss in time for deep training networks despite non- infinite nature of the objective function. In this paper, we trained neural dynamics of the NTK for finite width ResNet using Deep Residual Network (ResNet) Our analysis suggests strongly that the particular neural-connection structure ResNet is the main reason for its triumph.
arXiv Detail & Related papers (2020-07-07T18:08:16Z)
On the Empirical Neural Tangent Kernel of Standard Finite-Width Convolutional Neural Network Architectures [3.4698840925433765]
It remains an open question how well NTK theory models standard neural network architectures of widths common in practice. We study this question empirically for two well-known convolutional neural network architectures, namely AlexNet and LeNet. For wider versions of these networks, where the number of channels and widths of fully-connected layers are increased, the deviation decreases.
arXiv Detail & Related papers (2020-06-24T11:40:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.