Related papers: Neural tangent kernel analysis of shallow $\alpha$-Stable ReLU neural networks

Neural tangent kernel analysis of shallow $\alpha$-Stable ReLU neural networks

URL: http://arxiv.org/abs/2206.08065v1
Date: Thu, 16 Jun 2022 10:28:03 GMT
Title: Neural tangent kernel analysis of shallow $\alpha$-Stable ReLU neural networks
Authors: Stefano Favaro, Sandra Fortini, Stefano Peluchetti
Abstract summary: We consider problems for $alpha$-Stable NNs, which generalize Gaussian NNs. For shallow $alpha$-Stable NNs with a ReLU function, we show that if the NN's width goes to infinity then a rescaled NN converges weakly to an $alpha$-Stable process. Our main contribution is the NTK analysis of shallow $alpha$-Stable ReLU-NNs, which leads to an equivalence between training a rescaled NN and performing a kernel regression with an $(alpha/
Score: 8.000374471991247
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There is a recent literature on large-width properties of Gaussian neural networks (NNs), i.e. NNs whose weights are distributed according to Gaussian distributions. Two popular problems are: i) the study of the large-width behaviour of NNs, which provided a characterization of the infinitely wide limit of a rescaled NN in terms of a Gaussian process; ii) the study of the large-width training dynamics of NNs, which set forth an equivalence between training the rescaled NN and performing a kernel regression with a deterministic kernel referred to as the neural tangent kernel (NTK). In this paper, we consider these problems for $\alpha$-Stable NNs, which generalize Gaussian NNs by assuming that the NN's weights are distributed as $\alpha$-Stable distributions with $\alpha\in(0,2]$, i.e. distributions with heavy tails. For shallow $\alpha$-Stable NNs with a ReLU activation function, we show that if the NN's width goes to infinity then a rescaled NN converges weakly to an $\alpha$-Stable process, i.e. a stochastic process with $\alpha$-Stable finite-dimensional distributions. As a novelty with respect to the Gaussian setting, in the $\alpha$-Stable setting the choice of the activation function affects the scaling of the NN, that is: to achieve the infinitely wide $\alpha$-Stable process, the ReLU function requires an additional logarithmic scaling with respect to sub-linear functions. Then, our main contribution is the NTK analysis of shallow $\alpha$-Stable ReLU-NNs, which leads to an equivalence between training a rescaled NN and performing a kernel regression with an $(\alpha/2)$-Stable random kernel. The randomness of such a kernel is a further novelty with respect to the Gaussian setting, that is: in the $\alpha$-Stable setting the randomness of the NN at initialization does not vanish in the NTK analysis, thus inducing a distribution for the kernel of the underlying kernel regression.

Related papers

Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods [0.0]
We consider functions as expectations of Sobolev functions over all possible one-dimensional projections of the data. This framework is similar to kernel ridge regression, where the kernel is $mathbbE_w ( k(B)(wtop x,wtop xprime))$, with $k(B)(a,b) := min(|a|, |b|)mathds1_ab>0$ the Brownian kernel, and the distribution of the projections $w$ is learnt
arXiv Detail & Related papers (2024-07-24T13:46:50Z)
Kernel vs. Kernel: Exploring How the Data Structure Affects Neural Collapse [9.975341265604577]
"Neural Collapse" is the decrease in the within class variability of the network's deepest features, dubbed as NC1. We provide a kernel-based analysis that does not suffer from this limitation. We show that the NTK does not represent more collapsed features than the NNGP for prototypical data models.
arXiv Detail & Related papers (2024-06-04T08:33:56Z)
Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology. We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK) This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z)
Neural Networks for Singular Perturbations [0.0]
We prove expressivity rate bounds for solution sets of a model class of singularly perturbed, elliptic two-point boundary value problems. We establish expression rate bounds in Sobolev norms in terms of the NN size.
arXiv Detail & Related papers (2024-01-12T16:02:18Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Non-asymptotic approximations of Gaussian neural networks via second-order Poincaré inequalities [6.499759302108927]
We investigate the use of second-order Poincar'e inequalities as an alternative approach to establish QCLTs for the NN's output.<n>We show how our approach is effective in establishing QCLTs for the NN's output, though it leads to suboptimal rates of convergence.
arXiv Detail & Related papers (2023-04-08T13:52:10Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp) In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks. We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z)
On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK) In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z)
Deep Stable neural networks: large-width asymptotics and convergence rates [3.0108936184913295]
We show that as the width goes to infinity jointly over the NN's layers, a suitable rescaled deep Stable NN converges weakly to a Stable SP. Because of the non-triangular NN's structure, this is a non-standard problem, to which we propose a novel and self-contained inductive approach.
arXiv Detail & Related papers (2021-08-02T12:18:00Z)
Neural Optimization Kernel: Towards Robust Deep Learning [13.147925376013129]
Recent studies show a connection between neural networks (NN) and kernel methods. This paper proposes a novel kernel family named Kernel (NOK) We show that over parameterized deep NN (NOK) can increase the expressive power to reduce empirical risk and reduce the bound generalization at the same time.
arXiv Detail & Related papers (2021-06-11T00:34:55Z)
Large-width functional asymptotics for deep Gaussian neural networks [2.7561479348365734]
We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions. Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
arXiv Detail & Related papers (2021-02-20T10:14:37Z)
Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate. We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.