Neural tangent kernel analysis of shallow $\alpha$-Stable ReLU neural
networks
- URL: http://arxiv.org/abs/2206.08065v1
- Date: Thu, 16 Jun 2022 10:28:03 GMT
- Title: Neural tangent kernel analysis of shallow $\alpha$-Stable ReLU neural
networks
- Authors: Stefano Favaro, Sandra Fortini, Stefano Peluchetti
- Abstract summary: We consider problems for $alpha$-Stable NNs, which generalize Gaussian NNs.
For shallow $alpha$-Stable NNs with a ReLU function, we show that if the NN's width goes to infinity then a rescaled NN converges weakly to an $alpha$-Stable process.
Our main contribution is the NTK analysis of shallow $alpha$-Stable ReLU-NNs, which leads to an equivalence between training a rescaled NN and performing a kernel regression with an $(alpha/
- Score: 8.000374471991247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a recent literature on large-width properties of Gaussian neural
networks (NNs), i.e. NNs whose weights are distributed according to Gaussian
distributions. Two popular problems are: i) the study of the large-width
behaviour of NNs, which provided a characterization of the infinitely wide
limit of a rescaled NN in terms of a Gaussian process; ii) the study of the
large-width training dynamics of NNs, which set forth an equivalence between
training the rescaled NN and performing a kernel regression with a
deterministic kernel referred to as the neural tangent kernel (NTK). In this
paper, we consider these problems for $\alpha$-Stable NNs, which generalize
Gaussian NNs by assuming that the NN's weights are distributed as
$\alpha$-Stable distributions with $\alpha\in(0,2]$, i.e. distributions with
heavy tails. For shallow $\alpha$-Stable NNs with a ReLU activation function,
we show that if the NN's width goes to infinity then a rescaled NN converges
weakly to an $\alpha$-Stable process, i.e. a stochastic process with
$\alpha$-Stable finite-dimensional distributions. As a novelty with respect to
the Gaussian setting, in the $\alpha$-Stable setting the choice of the
activation function affects the scaling of the NN, that is: to achieve the
infinitely wide $\alpha$-Stable process, the ReLU function requires an
additional logarithmic scaling with respect to sub-linear functions. Then, our
main contribution is the NTK analysis of shallow $\alpha$-Stable ReLU-NNs,
which leads to an equivalence between training a rescaled NN and performing a
kernel regression with an $(\alpha/2)$-Stable random kernel. The randomness of
such a kernel is a further novelty with respect to the Gaussian setting, that
is: in the $\alpha$-Stable setting the randomness of the NN at initialization
does not vanish in the NTK analysis, thus inducing a distribution for the
kernel of the underlying kernel regression.
Related papers
- Kernel vs. Kernel: Exploring How the Data Structure Affects Neural Collapse [9.975341265604577]
"Neural Collapse" is the decrease in the within class variability of the network's deepest features, dubbed as NC1.
We provide a kernel-based analysis that does not suffer from this limitation.
We show that the NTK does not represent more collapsed features than the NNGP for prototypical data models.
arXiv Detail & Related papers (2024-06-04T08:33:56Z) - Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - Neural Networks for Singular Perturbations [0.0]
We prove expressivity rate bounds for solution sets of a model class of singularly perturbed, elliptic two-point boundary value problems.
We establish expression rate bounds in Sobolev norms in terms of the NN size.
arXiv Detail & Related papers (2024-01-12T16:02:18Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Deep Stable neural networks: large-width asymptotics and convergence
rates [3.0108936184913295]
We show that as the width goes to infinity jointly over the NN's layers, a suitable rescaled deep Stable NN converges weakly to a Stable SP.
Because of the non-triangular NN's structure, this is a non-standard problem, to which we propose a novel and self-contained inductive approach.
arXiv Detail & Related papers (2021-08-02T12:18:00Z) - Neural Optimization Kernel: Towards Robust Deep Learning [13.147925376013129]
Recent studies show a connection between neural networks (NN) and kernel methods.
This paper proposes a novel kernel family named Kernel (NOK)
We show that over parameterized deep NN (NOK) can increase the expressive power to reduce empirical risk and reduce the bound generalization at the same time.
arXiv Detail & Related papers (2021-06-11T00:34:55Z) - Large-width functional asymptotics for deep Gaussian neural networks [2.7561479348365734]
We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions.
Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
arXiv Detail & Related papers (2021-02-20T10:14:37Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.