Analyzing the Neural Tangent Kernel of Periodically Activated Coordinate
Networks
- URL: http://arxiv.org/abs/2402.04783v1
- Date: Wed, 7 Feb 2024 12:06:52 GMT
- Title: Analyzing the Neural Tangent Kernel of Periodically Activated Coordinate
Networks
- Authors: Hemanth Saratchandran, Shin-Fang Chng, Simon Lucey
- Abstract summary: We provide a theoretical understanding of periodically activated networks through an analysis of their Neural Tangent Kernel (NTK)
Our findings indicate that periodically activated networks are textitnotably more well-behaved, from the NTK perspective, than ReLU activated networks.
- Score: 30.92757082348805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, neural networks utilizing periodic activation functions have been
proven to demonstrate superior performance in vision tasks compared to
traditional ReLU-activated networks. However, there is still a limited
understanding of the underlying reasons for this improved performance. In this
paper, we aim to address this gap by providing a theoretical understanding of
periodically activated networks through an analysis of their Neural Tangent
Kernel (NTK). We derive bounds on the minimum eigenvalue of their NTK in the
finite width setting, using a fairly general network architecture which
requires only one wide layer that grows at least linearly with the number of
data samples. Our findings indicate that periodically activated networks are
\textit{notably more well-behaved}, from the NTK perspective, than ReLU
activated networks. Additionally, we give an application to the memorization
capacity of such networks and verify our theoretical predictions empirically.
Our study offers a deeper understanding of the properties of periodically
activated neural networks and their potential in the field of deep learning.
Related papers
- How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Limitations of the NTK for Understanding Generalization in Deep Learning [13.44676002603497]
We study NTKs through the lens of scaling laws, and demonstrate that they fall short of explaining important aspects of neural network generalization.
We show that even if the empirical NTK is allowed to be pre-trained on a constant number of samples, the kernel scaling does not catch up to the neural network scaling.
arXiv Detail & Related papers (2022-06-20T21:23:28Z) - The Spectral Bias of Polynomial Neural Networks [63.27903166253743]
Polynomial neural networks (PNNs) have been shown to be particularly effective at image generation and face recognition, where high-frequency information is critical.
Previous studies have revealed that neural networks demonstrate a $textitspectral bias$ towards low-frequency functions, which yields faster learning of low-frequency components during training.
Inspired by such studies, we conduct a spectral analysis of the Tangent Kernel (NTK) of PNNs.
We find that the $Pi$-Net family, i.e., a recently proposed parametrization of PNNs, speeds up the
arXiv Detail & Related papers (2022-02-27T23:12:43Z) - Neural Tangent Kernel Analysis of Deep Narrow Neural Networks [11.623483126242478]
We present the first trainability guarantee of infinitely deep but narrow neural networks.
We then extend the analysis to an infinitely deep convolutional neural network (CNN) and perform brief experiments.
arXiv Detail & Related papers (2022-02-07T07:27:02Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - The Surprising Simplicity of the Early-Time Learning Dynamics of Neural
Networks [43.860358308049044]
In work, we show that these common perceptions can be completely false in the early phase of learning.
We argue that this surprising simplicity can persist in networks with more layers with convolutional architecture.
arXiv Detail & Related papers (2020-06-25T17:42:49Z) - Depth Enables Long-Term Memory for Recurrent Neural Networks [0.0]
We introduce a measure of the network's ability to support information flow across time, referred to as the Start-End separation rank.
We prove that deep recurrent networks support Start-End separation ranks which are higher than those supported by their shallow counterparts.
arXiv Detail & Related papers (2020-03-23T10:29:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.