Infinite-channel deep stable convolutional neural networks
- URL: http://arxiv.org/abs/2102.03739v1
- Date: Sun, 7 Feb 2021 08:12:46 GMT
- Title: Infinite-channel deep stable convolutional neural networks
- Authors: Daniele Bracale, Stefano Favaro, Sandra Fortini, Stefano Peluchetti
- Abstract summary: In this paper, we consider the problem of removing A1 in the general context of deep feed-forward convolutional NNs.
We show that the infinite-channel limit of a deep feed-forward convolutional NNs, under suitable scaling, is a process with a stable finite-dimensional distribution.
- Score: 2.7561479348365734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The interplay between infinite-width neural networks (NNs) and classes of
Gaussian processes (GPs) is well known since the seminal work of Neal (1996).
While numerous theoretical refinements have been proposed in the recent years,
the interplay between NNs and GPs relies on two critical distributional
assumptions on the NN's parameters: A1) finite variance; A2) independent and
identical distribution (iid). In this paper, we consider the problem of
removing A1 in the general context of deep feed-forward convolutional NNs. In
particular, we assume iid parameters distributed according to a stable
distribution and we show that the infinite-channel limit of a deep feed-forward
convolutional NNs, under suitable scaling, is a stochastic process with
multivariate stable finite-dimensional distributions. Such a limiting
distribution is then characterized through an explicit backward recursion for
its parameters over the layers. Our contribution extends results of Favaro et
al. (2020) to convolutional architectures, and it paves the way to expand
exciting recent lines of research that rely on classes of GP limits.
Related papers
- Deep Kernel Posterior Learning under Infinite Variance Prior Weights [1.5960546024967326]
We show that a Bayesian deep neural network converges to a process with $alpha$-stable marginals in each layer that has a conditionally Gaussian representation.
We also provide useful generalizations of the results of Lor'ia & Bhadra (2024) on shallow networks on shallow multi-layer networks.
The computational and statistical benefits over competing approaches stand out in simulations and in demonstrations on benchmark data sets.
arXiv Detail & Related papers (2024-10-02T07:13:17Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - Generalization Guarantees of Gradient Descent for Multi-Layer Neural
Networks [55.86300309474023]
We conduct a comprehensive stability and generalization analysis of gradient descent (GD) for multi-layer NNs.
We derive the excess risk rate of $O(1/sqrtn)$ for GD algorithms in both two-layer and three-layer NNs.
arXiv Detail & Related papers (2023-05-26T12:51:38Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Interrelation of equivariant Gaussian processes and convolutional neural
networks [77.34726150561087]
Currently there exists rather promising new trend in machine leaning (ML) based on the relationship between neural networks (NN) and Gaussian processes (GP)
In this work we establish a relationship between the many-channel limit for CNNs equivariant with respect to two-dimensional Euclidean group with vector-valued neuron activations and the corresponding independently introduced equivariant Gaussian processes (GP)
arXiv Detail & Related papers (2022-09-17T17:02:35Z) - Deep Stable neural networks: large-width asymptotics and convergence
rates [3.0108936184913295]
We show that as the width goes to infinity jointly over the NN's layers, a suitable rescaled deep Stable NN converges weakly to a Stable SP.
Because of the non-triangular NN's structure, this is a non-standard problem, to which we propose a novel and self-contained inductive approach.
arXiv Detail & Related papers (2021-08-02T12:18:00Z) - Double-descent curves in neural networks: a new perspective using
Gaussian processes [9.153116600213641]
Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters.
We use techniques from random matrix theory to characterize the spectral distribution of the empirical feature covariance matrix as a width-dependent of the spectrum of the neural network Gaussian process kernel.
arXiv Detail & Related papers (2021-02-14T20:31:49Z) - Bayesian Deep Ensembles via the Neural Tangent Kernel [49.569912265882124]
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK)
We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member.
We prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit.
arXiv Detail & Related papers (2020-07-11T22:10:52Z) - Infinite attention: NNGP and NTK for deep attention networks [38.55012122588628]
We identify an equivalence between wide neural networks (NNs) and Gaussian processes (GPs)
We show that unlike single-head attention, which induces non-Gaussian behaviour, multi-head attention architectures behave as GPs as the number of heads tends to infinity.
We introduce new features to the Neural Tangents library allowing applications of NNGP/NTK models, with and without attention, to variable-length sequences.
arXiv Detail & Related papers (2020-06-18T13:57:01Z) - Stable behaviour of infinitely wide deep neural networks [8.000374471991247]
We consider fully connected feed-forward deep neural networks (NNs) where weights and biases are independent and identically distributed.
We show that the infinite wide limit of the NN, under suitable scaling on the weights, is a process whose finite-dimensional distributions are stable distributions.
arXiv Detail & Related papers (2020-03-01T04:07:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.