Deep Stable neural networks: large-width asymptotics and convergence
rates
- URL: http://arxiv.org/abs/2108.02316v1
- Date: Mon, 2 Aug 2021 12:18:00 GMT
- Title: Deep Stable neural networks: large-width asymptotics and convergence
rates
- Authors: Stefano Favaro, Sandra Fortini, Stefano Peluchetti
- Abstract summary: We show that as the width goes to infinity jointly over the NN's layers, a suitable rescaled deep Stable NN converges weakly to a Stable SP.
Because of the non-triangular NN's structure, this is a non-standard problem, to which we propose a novel and self-contained inductive approach.
- Score: 3.0108936184913295
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In modern deep learning, there is a recent and growing literature on the
interplay between large-width asymptotics for deep Gaussian neural networks
(NNs), i.e. deep NNs with Gaussian-distributed weights, and classes of Gaussian
stochastic processes (SPs). Such an interplay has proved to be critical in
several contexts of practical interest, e.g. Bayesian inference under Gaussian
SP priors, kernel regression for infinite-wide deep NNs trained via gradient
descent, and information propagation within infinite-wide NNs. Motivated by
empirical analysis, showing the potential of replacing Gaussian distributions
with Stable distributions for the NN's weights, in this paper we investigate
large-width asymptotics for (fully connected) feed-forward deep Stable NNs,
i.e. deep NNs with Stable-distributed weights. First, we show that as the width
goes to infinity jointly over the NN's layers, a suitable rescaled deep Stable
NN converges weakly to a Stable SP whose distribution is characterized
recursively through the NN's layers. Because of the non-triangular NN's
structure, this is a non-standard asymptotic problem, to which we propose a
novel and self-contained inductive approach, which may be of independent
interest. Then, we establish sup-norm convergence rates of a deep Stable NN to
a Stable SP, quantifying the critical difference between the settings of
``joint growth" and ``sequential growth" of the width over the NN's layers. Our
work extends recent results on infinite-wide limits for deep Gaussian NNs to
the more general deep Stable NNs, providing the first result on convergence
rates for infinite-wide deep NNs.
Related papers
- Generalization Guarantees of Gradient Descent for Multi-Layer Neural
Networks [55.86300309474023]
We conduct a comprehensive stability and generalization analysis of gradient descent (GD) for multi-layer NNs.
We derive the excess risk rate of $O(1/sqrtn)$ for GD algorithms in both two-layer and three-layer NNs.
arXiv Detail & Related papers (2023-05-26T12:51:38Z) - Infinitely wide limits for deep Stable neural networks: sub-linear,
linear and super-linear activation functions [5.2475574493706025]
We investigate large-width properties of deep Stable NNs with Stable-distributed parameters.
We show that the scaling of Stable NNs and the stability of their infinitely wide limits may depend on the choice of the activation function.
arXiv Detail & Related papers (2023-04-08T13:45:52Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Infinite-channel deep stable convolutional neural networks [2.7561479348365734]
In this paper, we consider the problem of removing A1 in the general context of deep feed-forward convolutional NNs.
We show that the infinite-channel limit of a deep feed-forward convolutional NNs, under suitable scaling, is a process with a stable finite-dimensional distribution.
arXiv Detail & Related papers (2021-02-07T08:12:46Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Bayesian Deep Ensembles via the Neural Tangent Kernel [49.569912265882124]
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK)
We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member.
We prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit.
arXiv Detail & Related papers (2020-07-11T22:10:52Z) - Infinite attention: NNGP and NTK for deep attention networks [38.55012122588628]
We identify an equivalence between wide neural networks (NNs) and Gaussian processes (GPs)
We show that unlike single-head attention, which induces non-Gaussian behaviour, multi-head attention architectures behave as GPs as the number of heads tends to infinity.
We introduce new features to the Neural Tangents library allowing applications of NNGP/NTK models, with and without attention, to variable-length sequences.
arXiv Detail & Related papers (2020-06-18T13:57:01Z) - Stable behaviour of infinitely wide deep neural networks [8.000374471991247]
We consider fully connected feed-forward deep neural networks (NNs) where weights and biases are independent and identically distributed.
We show that the infinite wide limit of the NN, under suitable scaling on the weights, is a process whose finite-dimensional distributions are stable distributions.
arXiv Detail & Related papers (2020-03-01T04:07:30Z) - On Random Kernels of Residual Architectures [93.94469470368988]
We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets.
Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity.
In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed.
arXiv Detail & Related papers (2020-01-28T16:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.