Related papers: Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions

Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions

URL: http://arxiv.org/abs/2304.04008v1
Date: Sat, 8 Apr 2023 13:45:52 GMT
Title: Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions
Authors: Alberto Bordino, Stefano Favaro, Sandra Fortini
Abstract summary: We investigate large-width properties of deep Stable NNs with Stable-distributed parameters. We show that the scaling of Stable NNs and the stability of their infinitely wide limits may depend on the choice of the activation function.
Score: 5.2475574493706025
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There is a growing literature on the study of large-width properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed parameters or weights, and Gaussian stochastic processes. Motivated by some empirical and theoretical studies showing the potential of replacing Gaussian distributions with Stable distributions, namely distributions with heavy tails, in this paper we investigate large-width properties of deep Stable NNs, i.e. deep NNs with Stable-distributed parameters. For sub-linear activation functions, a recent work has characterized the infinitely wide limit of a suitable rescaled deep Stable NN in terms of a Stable stochastic process, both under the assumption of a ``joint growth" and under the assumption of a ``sequential growth" of the width over the NN's layers. Here, assuming a ``sequential growth" of the width, we extend such a characterization to a general class of activation functions, which includes sub-linear, asymptotically linear and super-linear functions. As a novelty with respect to previous works, our results rely on the use of a generalized central limit theorem for heavy tails distributions, which allows for an interesting unified treatment of infinitely wide limits for deep Stable NNs. Our study shows that the scaling of Stable NNs and the stability of their infinitely wide limits may depend on the choice of the activation function, bringing out a critical difference with respect to the Gaussian setting.

Related papers

Approximation properties of neural ODEs [3.6779367026247627]
We study the approximation properties of shallow neural networks whose activation function is defined as the flow of a neural ordinary differential equation (neural ODE) We prove the universal approximation property (UAP) of such shallow neural networks in the space of continuous functions.
arXiv Detail & Related papers (2025-03-19T21:11:28Z)
Proportional infinite-width infinite-depth limit for deep linear neural networks [0.16385815610837165]
We study the distributional properties of linear neural networks with random parameters in the context of large networks, where the number of layers diverges in proportion to the number of neurons per layer. We explore the joint proportional limit in which both depth and width diverge but maintain a constant ratio, yielding a non-Gaussian distribution that retains correlations between outputs.
arXiv Detail & Related papers (2024-11-22T11:25:52Z)
Uniform Convergence of Deep Neural Networks with Lipschitz Continuous Activation Functions and Variable Widths [3.0069322256338906]
We consider deep neural networks with a Lipschitz continuous activation function and with weight matrices of variable widths. In particular, as convolutional neural networks are special deep neural networks with weight matrices of increasing widths, we put forward conditions on the mask sequence. The Lipschitz continuity assumption on the activation functions allows us to include in our theory most of commonly used activation functions in applications.
arXiv Detail & Related papers (2023-06-02T17:07:12Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Exploring Linear Feature Disentanglement For Neural Networks [63.20827189693117]
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs) Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space. This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
arXiv Detail & Related papers (2022-03-22T13:09:17Z)
Deep Stable neural networks: large-width asymptotics and convergence rates [3.0108936184913295]
We show that as the width goes to infinity jointly over the NN's layers, a suitable rescaled deep Stable NN converges weakly to a Stable SP. Because of the non-triangular NN's structure, this is a non-standard problem, to which we propose a novel and self-contained inductive approach.
arXiv Detail & Related papers (2021-08-02T12:18:00Z)
Infinite-channel deep stable convolutional neural networks [2.7561479348365734]
In this paper, we consider the problem of removing A1 in the general context of deep feed-forward convolutional NNs. We show that the infinite-channel limit of a deep feed-forward convolutional NNs, under suitable scaling, is a process with a stable finite-dimensional distribution.
arXiv Detail & Related papers (2021-02-07T08:12:46Z)
Bayesian Deep Ensembles via the Neural Tangent Kernel [49.569912265882124]
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK) We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member. We prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit.
arXiv Detail & Related papers (2020-07-11T22:10:52Z)
Stable behaviour of infinitely wide deep neural networks [8.000374471991247]
We consider fully connected feed-forward deep neural networks (NNs) where weights and biases are independent and identically distributed. We show that the infinite wide limit of the NN, under suitable scaling on the weights, is a process whose finite-dimensional distributions are stable distributions.
arXiv Detail & Related papers (2020-03-01T04:07:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.