Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study
- URL: http://arxiv.org/abs/2209.07736v1
- Date: Fri, 16 Sep 2022 06:36:06 GMT
- Title: Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study
- Authors: Yongtao Wu, Zhenyu Zhu, Fanghui Liu, Grigorios G Chrysos, Volkan
Cevher
- Abstract summary: The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
- Score: 55.12108376616355
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural tangent kernel (NTK) is a powerful tool to analyze training dynamics
of neural networks and their generalization bounds. The study on NTK has been
devoted to typical neural network architectures, but is incomplete for neural
networks with Hadamard products (NNs-Hp), e.g., StyleGAN and polynomial neural
networks. In this work, we derive the finite-width NTK formulation for a
special class of NNs-Hp, i.e., polynomial neural networks. We prove their
equivalence to the kernel regression predictor with the associated NTK, which
expands the application scope of NTK. Based on our results, we elucidate the
separation of PNNs over standard neural networks with respect to extrapolation
and spectral bias. Our two key insights are that when compared to standard
neural networks, PNNs are able to fit more complicated functions in the
extrapolation regime and admit a slower eigenvalue decay of the respective NTK.
Besides, our theoretical results can be extended to other types of NNs-Hp,
which expand the scope of our work. Our empirical results validate the
separations in broader classes of NNs-Hp, which provide a good justification
for a deeper understanding of neural architectures.
Related papers
- Equivariant Neural Tangent Kernels [2.373992571236766]
We give explicit expressions for neural tangent kernels (NTKs) of group convolutional neural networks.
In numerical experiments, we demonstrate superior performance for equivariant NTKs over non-equivariant NTKs on a classification task for medical images.
arXiv Detail & Related papers (2024-06-10T17:43:13Z) - Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - Deep Neural Networks via Complex Network Theory: a Perspective [3.1023851130450684]
Deep Neural Networks (DNNs) can be represented as graphs whose links and vertices iteratively process data and solve tasks sub-optimally. Complex Network Theory (CNT), merging statistical physics with graph theory, provides a method for interpreting neural networks by analysing their weights and neuron structures.
In this work, we extend the existing CNT metrics with measures that sample from the DNNs' training distribution, shifting from a purely topological analysis to one that connects with the interpretability of deep learning.
arXiv Detail & Related papers (2024-04-17T08:42:42Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - The Spectral Bias of Polynomial Neural Networks [63.27903166253743]
Polynomial neural networks (PNNs) have been shown to be particularly effective at image generation and face recognition, where high-frequency information is critical.
Previous studies have revealed that neural networks demonstrate a $textitspectral bias$ towards low-frequency functions, which yields faster learning of low-frequency components during training.
Inspired by such studies, we conduct a spectral analysis of the Tangent Kernel (NTK) of PNNs.
We find that the $Pi$-Net family, i.e., a recently proposed parametrization of PNNs, speeds up the
arXiv Detail & Related papers (2022-02-27T23:12:43Z) - Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel
Theory? [2.0711789781518752]
Neural Kernel (NTK) theory is widely used to study the dynamics of infinitely-wide deep neural networks (DNNs) under gradient descent.
We study empirically when NTK theory is valid in practice for fully-connected ReLU and sigmoid DNNs.
In particular, NTK theory does not explain the behavior of sufficiently deep networks so that their gradients explode as they propagate through the network's layers.
arXiv Detail & Related papers (2020-12-08T15:19:45Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - The Recurrent Neural Tangent Kernel [11.591070761599328]
We introduce and study the Recurrent Neural Tangent Kernel (RNTK), which provides new insights into the behavior of overparametrized RNNs.
A synthetic and 56 real-world data experiments demonstrate that the RNTK offers significant performance gains over other kernels.
arXiv Detail & Related papers (2020-06-18T02:59:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.