Large-width functional asymptotics for deep Gaussian neural networks
- URL: http://arxiv.org/abs/2102.10307v1
- Date: Sat, 20 Feb 2021 10:14:37 GMT
- Title: Large-width functional asymptotics for deep Gaussian neural networks
- Authors: Daniele Bracale, Stefano Favaro, Sandra Fortini, Stefano Peluchetti
- Abstract summary: We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions.
Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
- Score: 2.7561479348365734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we consider fully connected feed-forward deep neural networks
where weights and biases are independent and identically distributed according
to Gaussian distributions. Extending previous results (Matthews et al.,
2018a;b; Yang, 2019) we adopt a function-space perspective, i.e. we look at
neural networks as infinite-dimensional random elements on the input space
$\mathbb{R}^I$. Under suitable assumptions on the activation function we show
that: i) a network defines a continuous Gaussian process on the input space
$\mathbb{R}^I$; ii) a network with re-scaled weights converges weakly to a
continuous Gaussian process in the large-width limit; iii) the limiting
Gaussian process has almost surely locally $\gamma$-H\"older continuous paths,
for $0 < \gamma <1$. Our results contribute to recent theoretical studies on
the interplay between infinitely wide deep neural networks and Gaussian
processes by establishing weak convergence in function-space with respect to a
stronger metric.
Related papers
- Wide Deep Neural Networks with Gaussian Weights are Very Close to
Gaussian Processes [1.0878040851638]
We show that the distance between the network output and the corresponding Gaussian approximation scales inversely with the width of the network, exhibiting faster convergence than the naive suggested by the central limit theorem.
We also apply our bounds to obtain theoretical approximations for the exact posterior distribution of the network, when the likelihood is a bounded Lipschitz function of the network output evaluated on a (finite) training set.
arXiv Detail & Related papers (2023-12-18T22:29:40Z) - Quantitative CLTs in Deep Neural Networks [12.845031126178593]
We study the distribution of a fully connected neural network with random Gaussian weights and biases.
We obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth.
Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature.
arXiv Detail & Related papers (2023-07-12T11:35:37Z) - Wide neural networks: From non-gaussian random fields at initialization
to the NTK geometry of training [0.0]
Recent developments in applications of artificial neural networks with over $n=1014$ parameters make it extremely important to study the large $n$ behaviour of such networks.
Most works studying wide neural networks have focused on the infinite width $n to +infty$ limit of such networks.
In this work we will study their behavior for large, but finite $n$.
arXiv Detail & Related papers (2023-04-06T21:34:13Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - Infinitely Wide Tensor Networks as Gaussian Process [1.7894377200944511]
In this paper, we show the equivalence of the infinitely wide Networks and the Gaussian Process.
We implement the Gaussian Process corresponding to the infinite limit tensor networks and plot the sample paths of these models.
arXiv Detail & Related papers (2021-01-07T02:29:15Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Stable behaviour of infinitely wide deep neural networks [8.000374471991247]
We consider fully connected feed-forward deep neural networks (NNs) where weights and biases are independent and identically distributed.
We show that the infinite wide limit of the NN, under suitable scaling on the weights, is a process whose finite-dimensional distributions are stable distributions.
arXiv Detail & Related papers (2020-03-01T04:07:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.