Neural Tangent Kernel Analysis of Deep Narrow Neural Networks
- URL: http://arxiv.org/abs/2202.02981v1
- Date: Mon, 7 Feb 2022 07:27:02 GMT
- Title: Neural Tangent Kernel Analysis of Deep Narrow Neural Networks
- Authors: Jongmin Lee, Joo Young Choi, Ernest K. Ryu, Albert No
- Abstract summary: We present the first trainability guarantee of infinitely deep but narrow neural networks.
We then extend the analysis to an infinitely deep convolutional neural network (CNN) and perform brief experiments.
- Score: 11.623483126242478
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The tremendous recent progress in analyzing the training dynamics of
overparameterized neural networks has primarily focused on wide networks and
therefore does not sufficiently address the role of depth in deep learning. In
this work, we present the first trainability guarantee of infinitely deep but
narrow neural networks. We study the infinite-depth limit of a multilayer
perceptron (MLP) with a specific initialization and establish a trainability
guarantee using the NTK theory. We then extend the analysis to an infinitely
deep convolutional neural network (CNN) and perform brief experiments
Related papers
- Peer-to-Peer Learning Dynamics of Wide Neural Networks [10.179711440042123]
We provide an explicit, non-asymptotic characterization of the learning dynamics of wide neural networks trained using popularDGD algorithms.
We validate our analytical results by accurately predicting error and error and for classification tasks.
arXiv Detail & Related papers (2024-09-23T17:57:58Z) - Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network.
We provide analytical expressions for these speed limits for linear and linearizable neural networks.
Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Sparsity-depth Tradeoff in Infinitely Wide Deep Neural Networks [22.083873334272027]
We observe that sparser networks outperform the non-sparse networks at shallow depths on a variety of datasets.
We extend the existing theory on the generalization error of kernel-ridge regression.
arXiv Detail & Related papers (2023-05-17T20:09:35Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Limitations of the NTK for Understanding Generalization in Deep Learning [13.44676002603497]
We study NTKs through the lens of scaling laws, and demonstrate that they fall short of explaining important aspects of neural network generalization.
We show that even if the empirical NTK is allowed to be pre-trained on a constant number of samples, the kernel scaling does not catch up to the neural network scaling.
arXiv Detail & Related papers (2022-06-20T21:23:28Z) - The Spectral Bias of Polynomial Neural Networks [63.27903166253743]
Polynomial neural networks (PNNs) have been shown to be particularly effective at image generation and face recognition, where high-frequency information is critical.
Previous studies have revealed that neural networks demonstrate a $textitspectral bias$ towards low-frequency functions, which yields faster learning of low-frequency components during training.
Inspired by such studies, we conduct a spectral analysis of the Tangent Kernel (NTK) of PNNs.
We find that the $Pi$-Net family, i.e., a recently proposed parametrization of PNNs, speeds up the
arXiv Detail & Related papers (2022-02-27T23:12:43Z) - Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel
Theory? [2.0711789781518752]
Neural Kernel (NTK) theory is widely used to study the dynamics of infinitely-wide deep neural networks (DNNs) under gradient descent.
We study empirically when NTK theory is valid in practice for fully-connected ReLU and sigmoid DNNs.
In particular, NTK theory does not explain the behavior of sufficiently deep networks so that their gradients explode as they propagate through the network's layers.
arXiv Detail & Related papers (2020-12-08T15:19:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.