Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for
Deep ReLU Networks
- URL: http://arxiv.org/abs/2012.11654v2
- Date: Wed, 23 Dec 2020 20:50:38 GMT
- Title: Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for
Deep ReLU Networks
- Authors: Quynh Nguyen, Marco Mondelli, Guido Montufar
- Abstract summary: We provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU networks.
In the finite-width setting, the network architectures we consider are quite general.
- Score: 21.13299067136635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recent line of work has analyzed the theoretical properties of deep neural
networks via the Neural Tangent Kernel (NTK). In particular, the smallest
eigenvalue of the NTK has been related to memorization capacity, convergence of
gradient descent algorithms and generalization of deep nets. However, existing
results either provide bounds in the two-layer setting or assume that the
spectrum of the NTK is bounded away from 0 for multi-layer networks. In this
paper, we provide tight bounds on the smallest eigenvalue of NTK matrices for
deep ReLU networks, both in the limiting case of infinite widths and for finite
widths. In the finite-width setting, the network architectures we consider are
quite general: we require the existence of a wide layer with roughly order of
$N$ neurons, $N$ being the number of data samples; and the scaling of the
remaining widths is arbitrary (up to logarithmic factors). To obtain our
results, we analyze various quantities of independent interest: we give lower
bounds on the smallest singular value of feature matrices, and upper bounds on
the Lipschitz constant of input-output feature maps.
Related papers
- Wide Neural Networks as Gaussian Processes: Lessons from Deep
Equilibrium Models [16.07760622196666]
We study the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers.
Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process.
Remarkably, this convergence holds even when the limits of depth and width are interchanged.
arXiv Detail & Related papers (2023-10-16T19:00:43Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Memorization and Optimization in Deep Neural Networks with Minimum
Over-parameterization [14.186776881154127]
The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks.
We show that the NTK is well conditioned in a challenging sub-linear setup.
Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks.
arXiv Detail & Related papers (2022-05-20T14:50:24Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Towards an Understanding of Residual Networks Using Neural Tangent
Hierarchy (NTH) [2.50686294157537]
Gradient descent yields zero loss in time for deep training networks despite non- infinite nature of the objective function.
In this paper, we trained neural dynamics of the NTK for finite width ResNet using Deep Residual Network (ResNet)
Our analysis suggests strongly that the particular neural-connection structure ResNet is the main reason for its triumph.
arXiv Detail & Related papers (2020-07-07T18:08:16Z) - On Random Kernels of Residual Architectures [93.94469470368988]
We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets.
Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity.
In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed.
arXiv Detail & Related papers (2020-01-28T16:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.