Random Neural Networks in the Infinite Width Limit as Gaussian Processes
- URL: http://arxiv.org/abs/2107.01562v1
- Date: Sun, 4 Jul 2021 07:00:20 GMT
- Title: Random Neural Networks in the Infinite Width Limit as Gaussian Processes
- Authors: Boris Hanin
- Abstract summary: This article gives a new proof that fully connected neural networks with random weights and biases converge to Gaussian processes in the regime where the input dimension, output dimension, and depth are kept fixed.
Unlike prior work, convergence is shown assuming only moment conditions for the distribution of weights and for quite general non-linearities.
- Score: 16.75218291152252
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This article gives a new proof that fully connected neural networks with
random weights and biases converge to Gaussian processes in the regime where
the input dimension, output dimension, and depth are kept fixed, while the
hidden layer widths tend to infinity. Unlike prior work, convergence is shown
assuming only moment conditions for the distribution of weights and for quite
general non-linearities.
Related papers
- Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - On the Neural Tangent Kernel of Equilibrium Models [72.29727250679477]
This work studies the neural tangent kernel (NTK) of the deep equilibrium (DEQ) model.
We show that contrarily a DEQ model still enjoys a deterministic NTK despite its width and depth going to infinity at the same time under mild conditions.
arXiv Detail & Related papers (2023-10-21T16:47:18Z) - Quantitative CLTs in Deep Neural Networks [12.845031126178593]
We study the distribution of a fully connected neural network with random Gaussian weights and biases.
We obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth.
Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature.
arXiv Detail & Related papers (2023-07-12T11:35:37Z) - Posterior Inference on Shallow Infinitely Wide Bayesian Neural Networks under Weights with Unbounded Variance [1.5960546024967326]
It is known that the infinite width scaling limit of a Bayesian neural network with one hidden layer is a Gaussian process, when the network weights have bounded prior variance.
Neal's result has been extended to networks with multiple hidden layers and to convolutional neural networks, also with Gaussian process scaling limits.
Our contribution is an interpretable and computationally efficient procedure for posterior inference, using a conditionally Gaussian representation, that then allows full use of the Gaussian process machinery for tractable posterior inference and uncertainty quantification in the non-Gaussian regime.
arXiv Detail & Related papers (2023-05-18T02:55:00Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Quantitative Gaussian Approximation of Randomly Initialized Deep Neural
Networks [1.0878040851638]
We show how the hidden and output layers sizes affect the Gaussian behaviour of the network.
Our explicit inequalities indicate how the hidden and output layers sizes affect the Gaussian behaviour of the network.
arXiv Detail & Related papers (2022-03-14T14:20:19Z) - Large-width functional asymptotics for deep Gaussian neural networks [2.7561479348365734]
We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions.
Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
arXiv Detail & Related papers (2021-02-20T10:14:37Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Bayesian Deep Ensembles via the Neural Tangent Kernel [49.569912265882124]
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK)
We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member.
We prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit.
arXiv Detail & Related papers (2020-07-11T22:10:52Z) - Stable behaviour of infinitely wide deep neural networks [8.000374471991247]
We consider fully connected feed-forward deep neural networks (NNs) where weights and biases are independent and identically distributed.
We show that the infinite wide limit of the NN, under suitable scaling on the weights, is a process whose finite-dimensional distributions are stable distributions.
arXiv Detail & Related papers (2020-03-01T04:07:30Z) - On Random Kernels of Residual Architectures [93.94469470368988]
We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets.
Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity.
In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed.
arXiv Detail & Related papers (2020-01-28T16:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.