On the Neural Tangent Kernel of Equilibrium Models
- URL: http://arxiv.org/abs/2310.14062v1
- Date: Sat, 21 Oct 2023 16:47:18 GMT
- Title: On the Neural Tangent Kernel of Equilibrium Models
- Authors: Zhili Feng and J.Zico Kolter
- Abstract summary: This work studies the neural tangent kernel (NTK) of the deep equilibrium (DEQ) model.
We show that contrarily a DEQ model still enjoys a deterministic NTK despite its width and depth going to infinity at the same time under mild conditions.
- Score: 72.29727250679477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work studies the neural tangent kernel (NTK) of the deep equilibrium
(DEQ) model, a practical ``infinite-depth'' architecture which directly
computes the infinite-depth limit of a weight-tied network via root-finding.
Even though the NTK of a fully-connected neural network can be stochastic if
its width and depth both tend to infinity simultaneously, we show that
contrarily a DEQ model still enjoys a deterministic NTK despite its width and
depth going to infinity at the same time under mild conditions. Moreover, this
deterministic NTK can be found efficiently via root-finding.
Related papers
- Wide Neural Networks as Gaussian Processes: Lessons from Deep
Equilibrium Models [16.07760622196666]
We study the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers.
Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process.
Remarkably, this convergence holds even when the limits of depth and width are interchanged.
arXiv Detail & Related papers (2023-10-16T19:00:43Z) - Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network.
We provide analytical expressions for these speed limits for linear and linearizable neural networks.
Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth
and Initialization [3.2971341821314777]
We study the NTK of fully-connected ReLU networks with depth comparable to width.
We show that the NTK of deep networks may stay constant during training only in the ordered phase.
arXiv Detail & Related papers (2022-02-01T16:52:16Z) - Bayesian Deep Ensembles via the Neural Tangent Kernel [49.569912265882124]
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK)
We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member.
We prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit.
arXiv Detail & Related papers (2020-07-11T22:10:52Z) - Towards an Understanding of Residual Networks Using Neural Tangent
Hierarchy (NTH) [2.50686294157537]
Gradient descent yields zero loss in time for deep training networks despite non- infinite nature of the objective function.
In this paper, we trained neural dynamics of the NTK for finite width ResNet using Deep Residual Network (ResNet)
Our analysis suggests strongly that the particular neural-connection structure ResNet is the main reason for its triumph.
arXiv Detail & Related papers (2020-07-07T18:08:16Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z) - On Random Kernels of Residual Architectures [93.94469470368988]
We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets.
Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity.
In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed.
arXiv Detail & Related papers (2020-01-28T16:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.