Wide Deep Neural Networks with Gaussian Weights are Very Close to
Gaussian Processes
- URL: http://arxiv.org/abs/2312.11737v1
- Date: Mon, 18 Dec 2023 22:29:40 GMT
- Title: Wide Deep Neural Networks with Gaussian Weights are Very Close to
Gaussian Processes
- Authors: Dario Trevisan
- Abstract summary: We show that the distance between the network output and the corresponding Gaussian approximation scales inversely with the width of the network, exhibiting faster convergence than the naive suggested by the central limit theorem.
We also apply our bounds to obtain theoretical approximations for the exact posterior distribution of the network, when the likelihood is a bounded Lipschitz function of the network output evaluated on a (finite) training set.
- Score: 1.0878040851638
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We establish novel rates for the Gaussian approximation of random deep neural
networks with Gaussian parameters (weights and biases) and Lipschitz activation
functions, in the wide limit. Our bounds apply for the joint output of a
network evaluated any finite input set, provided a certain non-degeneracy
condition of the infinite-width covariances holds. We demonstrate that the
distance between the network output and the corresponding Gaussian
approximation scales inversely with the width of the network, exhibiting faster
convergence than the naive heuristic suggested by the central limit theorem. We
also apply our bounds to obtain theoretical approximations for the exact
Bayesian posterior distribution of the network, when the likelihood is a
bounded Lipschitz function of the network output evaluated on a (finite)
training set. This includes popular cases such as the Gaussian likelihood, i.e.
exponential of minus the mean squared error.
Related papers
- Covering Numbers for Deep ReLU Networks with Applications to Function Approximation and Nonparametric Regression [4.297070083645049]
We develop tight (up to a multiplicative constant) lower and upper bounds on the covering numbers of fully-connected networks.
Thanks to the tightness of the bounds, a fundamental understanding of the impact of sparsity, quantization, bounded vs. unbounded weights, and network output truncation can be developed.
arXiv Detail & Related papers (2024-10-08T21:23:14Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Quantitative Gaussian Approximation of Randomly Initialized Deep Neural
Networks [1.0878040851638]
We show how the hidden and output layers sizes affect the Gaussian behaviour of the network.
Our explicit inequalities indicate how the hidden and output layers sizes affect the Gaussian behaviour of the network.
arXiv Detail & Related papers (2022-03-14T14:20:19Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - Large-width functional asymptotics for deep Gaussian neural networks [2.7561479348365734]
We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions.
Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
arXiv Detail & Related papers (2021-02-20T10:14:37Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Infinitely Wide Tensor Networks as Gaussian Process [1.7894377200944511]
In this paper, we show the equivalence of the infinitely wide Networks and the Gaussian Process.
We implement the Gaussian Process corresponding to the infinite limit tensor networks and plot the sample paths of these models.
arXiv Detail & Related papers (2021-01-07T02:29:15Z) - The Ridgelet Prior: A Covariance Function Approach to Prior
Specification for Bayesian Neural Networks [4.307812758854161]
We construct a prior distribution for the parameters of a network that approximates the posited Gaussian process in the output space of the network.
This establishes the property that a Bayesian neural network can approximate any Gaussian process whose covariance function is sufficiently regular.
arXiv Detail & Related papers (2020-10-16T16:39:45Z) - Bayesian Deep Ensembles via the Neural Tangent Kernel [49.569912265882124]
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK)
We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member.
We prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit.
arXiv Detail & Related papers (2020-07-11T22:10:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.