Stable behaviour of infinitely wide deep neural networks
- URL: http://arxiv.org/abs/2003.00394v1
- Date: Sun, 1 Mar 2020 04:07:30 GMT
- Title: Stable behaviour of infinitely wide deep neural networks
- Authors: Stefano Favaro, Sandra Fortini, Stefano Peluchetti
- Abstract summary: We consider fully connected feed-forward deep neural networks (NNs) where weights and biases are independent and identically distributed.
We show that the infinite wide limit of the NN, under suitable scaling on the weights, is a process whose finite-dimensional distributions are stable distributions.
- Score: 8.000374471991247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider fully connected feed-forward deep neural networks (NNs) where
weights and biases are independent and identically distributed as symmetric
centered stable distributions. Then, we show that the infinite wide limit of
the NN, under suitable scaling on the weights, is a stochastic process whose
finite-dimensional distributions are multivariate stable distributions. The
limiting process is referred to as the stable process, and it generalizes the
class of Gaussian processes recently obtained as infinite wide limits of NNs
(Matthews at al., 2018b). Parameters of the stable process can be computed via
an explicit recursion over the layers of the network. Our result contributes to
the theory of fully connected feed-forward deep NNs, and it paves the way to
expand recent lines of research that rely on Gaussian infinite wide limits.
Related papers
- On the Neural Tangent Kernel of Equilibrium Models [72.29727250679477]
This work studies the neural tangent kernel (NTK) of the deep equilibrium (DEQ) model.
We show that contrarily a DEQ model still enjoys a deterministic NTK despite its width and depth going to infinity at the same time under mild conditions.
arXiv Detail & Related papers (2023-10-21T16:47:18Z) - Quantitative CLTs in Deep Neural Networks [12.845031126178593]
We study the distribution of a fully connected neural network with random Gaussian weights and biases.
We obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth.
Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature.
arXiv Detail & Related papers (2023-07-12T11:35:37Z) - Infinitely wide limits for deep Stable neural networks: sub-linear,
linear and super-linear activation functions [5.2475574493706025]
We investigate large-width properties of deep Stable NNs with Stable-distributed parameters.
We show that the scaling of Stable NNs and the stability of their infinitely wide limits may depend on the choice of the activation function.
arXiv Detail & Related papers (2023-04-08T13:45:52Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Deep Stable neural networks: large-width asymptotics and convergence
rates [3.0108936184913295]
We show that as the width goes to infinity jointly over the NN's layers, a suitable rescaled deep Stable NN converges weakly to a Stable SP.
Because of the non-triangular NN's structure, this is a non-standard problem, to which we propose a novel and self-contained inductive approach.
arXiv Detail & Related papers (2021-08-02T12:18:00Z) - Large-width functional asymptotics for deep Gaussian neural networks [2.7561479348365734]
We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions.
Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
arXiv Detail & Related papers (2021-02-20T10:14:37Z) - Infinite-channel deep stable convolutional neural networks [2.7561479348365734]
In this paper, we consider the problem of removing A1 in the general context of deep feed-forward convolutional NNs.
We show that the infinite-channel limit of a deep feed-forward convolutional NNs, under suitable scaling, is a process with a stable finite-dimensional distribution.
arXiv Detail & Related papers (2021-02-07T08:12:46Z) - Bayesian Deep Ensembles via the Neural Tangent Kernel [49.569912265882124]
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK)
We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member.
We prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit.
arXiv Detail & Related papers (2020-07-11T22:10:52Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z) - On Random Kernels of Residual Architectures [93.94469470368988]
We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets.
Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity.
In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed.
arXiv Detail & Related papers (2020-01-28T16:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.