Related papers: Spectral Analysis of the Neural Tangent Kernel for Deep Residual Networks

Spectral Analysis of the Neural Tangent Kernel for Deep Residual Networks

URL: http://arxiv.org/abs/2104.03093v1
Date: Wed, 7 Apr 2021 12:35:19 GMT
Title: Spectral Analysis of the Neural Tangent Kernel for Deep Residual Networks
Authors: Yuval Belfer, Amnon Geifman, Meirav Galun, Ronen Basri
Abstract summary: We show that the eigenfunctions of ResNTK are the spherical harmonics and the eigenvalues decayly with frequency $k$ as $k-d$. We show, by drawing on the analogy to the Laplace kernel, that depending on the choice of a hyper- parameter that balances between the skip and residual connections ResNTK can either become spiky with depth, as with FC-NTK, or maintain a stable shape.
Score: 29.67334658659187
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep residual network architectures have been shown to achieve superior accuracy over classical feed-forward networks, yet their success is still not fully understood. Focusing on massively over-parameterized, fully connected residual networks with ReLU activation through their respective neural tangent kernels (ResNTK), we provide here a spectral analysis of these kernels. Specifically, we show that, much like NTK for fully connected networks (FC-NTK), for input distributed uniformly on the hypersphere $\mathbb{S}^{d-1}$, the eigenfunctions of ResNTK are the spherical harmonics and the eigenvalues decay polynomially with frequency $k$ as $k^{-d}$. These in turn imply that the set of functions in their Reproducing Kernel Hilbert Space are identical to those of FC-NTK, and consequently also to those of the Laplace kernel. We further show, by drawing on the analogy to the Laplace kernel, that depending on the choice of a hyper-parameter that balances between the skip and residual connections ResNTK can either become spiky with depth, as with FC-NTK, or maintain a stable shape.

Related papers

On the Neural Tangent Kernel of Equilibrium Models [72.29727250679477]
This work studies the neural tangent kernel (NTK) of the deep equilibrium (DEQ) model. We show that contrarily a DEQ model still enjoys a deterministic NTK despite its width and depth going to infinity at the same time under mild conditions.
arXiv Detail & Related papers (2023-10-21T16:47:18Z)
On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains [10.360517127652185]
We provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain. This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions.
arXiv Detail & Related papers (2023-05-04T08:54:40Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
A Kernel Perspective of Skip Connections in Convolutional Networks [21.458906138864176]
We study the properties ofResNets through their Gaussian Process and Neural Tangent kernels. Our results indicate that with ReLU activation, eigenvalues of these residual kernels decay at a similar rate compared to the same kernels when skip connections are not used. Our analysis further shows that the matrices obtained by these residual kernels yield favorable condition numbers at finite depths.
arXiv Detail & Related papers (2022-11-27T12:25:54Z)
On the Similarity between the Laplace and Neural Tangent Kernels [26.371904197642145]
We show that NTK for fully connected networks is closely related to the standard Laplace kernel. Our results suggest that much insight about neural networks can be obtained from analysis of the well-known Laplace kernel.
arXiv Detail & Related papers (2020-07-03T09:48:23Z)
Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate. We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)
Frequency Bias in Neural Networks for Input of Non-Uniform Density [27.75835200173761]
We use the Neural Tangent Kernel (NTK) model to explore the effect of variable density on training dynamics. Our results show convergence at a point $x in Sphered-1$ occurs in time $O(kappad/p(x))$ where $p(x)$ denotes the local density at $x$.
arXiv Detail & Related papers (2020-03-10T07:20:14Z)
Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite Networks [12.692279981822011]
We derive the covariance functions of multi-layer perceptrons with exponential linear units (ELU) and Gaussian error linear units (GELU) We analyse the fixed-point dynamics of iterated kernels corresponding to a broad range of activation functions. We find that unlike some previously studied neural network kernels, these new kernels exhibit non-trivial fixed-point dynamics.
arXiv Detail & Related papers (2020-02-20T01:25:39Z)
A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior. This implies that the training loss converges linearly up to a certain accuracy. We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)
On Random Kernels of Residual Architectures [93.94469470368988]
We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets. Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity. In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed.
arXiv Detail & Related papers (2020-01-28T16:47:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.