Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth
and Initialization
- URL: http://arxiv.org/abs/2202.00553v1
- Date: Tue, 1 Feb 2022 16:52:16 GMT
- Title: Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth
and Initialization
- Authors: Mariia Seleznova, Gitta Kutyniok
- Abstract summary: We study the NTK of fully-connected ReLU networks with depth comparable to width.
We show that the NTK of deep networks may stay constant during training only in the ordered phase.
- Score: 3.2971341821314777
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural Tangent Kernel (NTK) is widely used to analyze overparametrized neural
networks due to the famous result by (Jacot et al., 2018): in the
infinite-width limit, the NTK is deterministic and constant during training.
However, this result cannot explain the behavior of deep networks, since it
generally does not hold if depth and width tend to infinity simultaneously. In
this paper, we study the NTK of fully-connected ReLU networks with depth
comparable to width. We prove that the NTK properties depend significantly on
the depth-to-width ratio and the distribution of parameters at initialization.
In fact, our results indicate the importance of the three phases in the
hyperparameter space identified in (Poole et al., 2016): ordered, chaotic and
the edge of chaos (EOC). We derive exact expressions for the NTK dispersion in
the infinite-depth-and-width limit in all three phases and conclude that the
NTK variability grows exponentially with depth at the EOC and in the chaotic
phase but not in the ordered phase. We also show that the NTK of deep networks
may stay constant during training only in the ordered phase and discuss how the
structure of the NTK matrix changes during training.
Related papers
- On the Neural Tangent Kernel of Equilibrium Models [72.29727250679477]
This work studies the neural tangent kernel (NTK) of the deep equilibrium (DEQ) model.
We show that contrarily a DEQ model still enjoys a deterministic NTK despite its width and depth going to infinity at the same time under mild conditions.
arXiv Detail & Related papers (2023-10-21T16:47:18Z) - Feature Learning and Generalization in Deep Networks with Orthogonal Weights [1.7956122940209063]
Deep neural networks with numerically weights from independent Gaussian distributions can be tuned to criticality.
These networks still exhibit fluctuations that grow linearly with the depth of the network.
We show analytically that rectangular networks with tanh activations and weights from the ensemble of matrices have corresponding preactivation fluctuations.
arXiv Detail & Related papers (2023-10-11T18:00:02Z) - Efficient NTK using Dimensionality Reduction [5.025654873456756]
We show how to obtain guarantees to those obtained by a prior analysis while reducing training and inference resource costs.
More generally, our work suggests how to analyze large width networks in which dense linear layers are replaced with a low complexity factorization.
arXiv Detail & Related papers (2022-10-10T16:11:03Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for
Deep ReLU Networks [21.13299067136635]
We provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU networks.
In the finite-width setting, the network architectures we consider are quite general.
arXiv Detail & Related papers (2020-12-21T19:32:17Z) - Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel
Theory? [2.0711789781518752]
Neural Kernel (NTK) theory is widely used to study the dynamics of infinitely-wide deep neural networks (DNNs) under gradient descent.
We study empirically when NTK theory is valid in practice for fully-connected ReLU and sigmoid DNNs.
In particular, NTK theory does not explain the behavior of sufficiently deep networks so that their gradients explode as they propagate through the network's layers.
arXiv Detail & Related papers (2020-12-08T15:19:45Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Bayesian Deep Ensembles via the Neural Tangent Kernel [49.569912265882124]
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK)
We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member.
We prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit.
arXiv Detail & Related papers (2020-07-11T22:10:52Z) - On Random Kernels of Residual Architectures [93.94469470368988]
We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets.
Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity.
In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed.
arXiv Detail & Related papers (2020-01-28T16:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.