Quantitative Gaussian Approximation of Randomly Initialized Deep Neural
Networks
- URL: http://arxiv.org/abs/2203.07379v2
- Date: Fri, 22 Sep 2023 08:28:35 GMT
- Title: Quantitative Gaussian Approximation of Randomly Initialized Deep Neural
Networks
- Authors: Andrea Basteri, Dario Trevisan
- Abstract summary: We show how the hidden and output layers sizes affect the Gaussian behaviour of the network.
Our explicit inequalities indicate how the hidden and output layers sizes affect the Gaussian behaviour of the network.
- Score: 1.0878040851638
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Given any deep fully connected neural network, initialized with random
Gaussian parameters, we bound from above the quadratic Wasserstein distance
between its output distribution and a suitable Gaussian process. Our explicit
inequalities indicate how the hidden and output layers sizes affect the
Gaussian behaviour of the network and quantitatively recover the distributional
convergence results in the wide limit, i.e., if all the hidden layers sizes
become large.
Related papers
- Proportional infinite-width infinite-depth limit for deep linear neural networks [0.16385815610837165]
We study the distributional properties of linear neural networks with random parameters in the context of large networks, where the number of layers diverges in proportion to the number of neurons per layer.
We explore the joint proportional limit in which both depth and width diverge but maintain a constant ratio, yielding a non-Gaussian distribution that retains correlations between outputs.
arXiv Detail & Related papers (2024-11-22T11:25:52Z) - Wide Deep Neural Networks with Gaussian Weights are Very Close to
Gaussian Processes [1.0878040851638]
We show that the distance between the network output and the corresponding Gaussian approximation scales inversely with the width of the network, exhibiting faster convergence than the naive suggested by the central limit theorem.
We also apply our bounds to obtain theoretical approximations for the exact posterior distribution of the network, when the likelihood is a bounded Lipschitz function of the network output evaluated on a (finite) training set.
arXiv Detail & Related papers (2023-12-18T22:29:40Z) - Quantitative CLTs in Deep Neural Networks [12.845031126178593]
We study the distribution of a fully connected neural network with random Gaussian weights and biases.
We obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth.
Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature.
arXiv Detail & Related papers (2023-07-12T11:35:37Z) - Bayesian inference with finitely wide neural networks [0.4568777157687961]
We propose a non-Gaussian distribution in differential form to model a finite set of outputs from a random neural network.
We are able to derive the non-Gaussian posterior distribution in Bayesian regression task.
arXiv Detail & Related papers (2023-03-06T03:25:30Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Super-resolution GANs of randomly-seeded fields [68.8204255655161]
We propose a novel super-resolution generative adversarial network (GAN) framework to estimate field quantities from random sparse sensors.
The algorithm exploits random sampling to provide incomplete views of the high-resolution underlying distributions.
The proposed technique is tested on synthetic databases of fluid flow simulations, ocean surface temperature distributions measurements, and particle image velocimetry data.
arXiv Detail & Related papers (2022-02-23T18:57:53Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - Random Neural Networks in the Infinite Width Limit as Gaussian Processes [16.75218291152252]
This article gives a new proof that fully connected neural networks with random weights and biases converge to Gaussian processes in the regime where the input dimension, output dimension, and depth are kept fixed.
Unlike prior work, convergence is shown assuming only moment conditions for the distribution of weights and for quite general non-linearities.
arXiv Detail & Related papers (2021-07-04T07:00:20Z) - On the Convex Behavior of Deep Neural Networks in Relation to the
Layers' Width [99.24399270311069]
We observe that for wider networks, minimizing the loss with the descent optimization maneuvers through surfaces of positive curvatures at the start and end of training, and close to zero curvatures in between.
In other words, it seems that during crucial parts of the training process, the Hessian in wide networks is dominated by the component G.
arXiv Detail & Related papers (2020-01-14T16:30:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.