Non-asymptotic approximations of neural networks by Gaussian processes
- URL: http://arxiv.org/abs/2102.08668v1
- Date: Wed, 17 Feb 2021 10:19:26 GMT
- Title: Non-asymptotic approximations of neural networks by Gaussian processes
- Authors: Ronen Eldan and Dan Mikulincer and Tselil Schramm
- Abstract summary: We study the extent to which wide neural networks may be approximated by Gaussian processes when with random weights.
As the width of a network goes to infinity, its law converges to that of a Gaussian process.
- Score: 7.56714041729893
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the extent to which wide neural networks may be approximated by
Gaussian processes when initialized with random weights. It is a
well-established fact that as the width of a network goes to infinity, its law
converges to that of a Gaussian process. We make this quantitative by
establishing explicit convergence rates for the central limit theorem in an
infinite-dimensional functional space, metrized with a natural transportation
distance. We identify two regimes of interest; when the activation function is
polynomial, its degree determines the rate of convergence, while for
non-polynomial activations, the rate is governed by the smoothness of the
function.
Related papers
- Random ReLU Neural Networks as Non-Gaussian Processes [20.607307985674428]
We show that random neural networks with rectified linear unit activation functions are well-defined non-Gaussian processes.
As a by-product, we demonstrate that these networks are solutions to differential equations driven by impulsive white noise.
arXiv Detail & Related papers (2024-05-16T16:28:11Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Wide Deep Neural Networks with Gaussian Weights are Very Close to
Gaussian Processes [1.0878040851638]
We show that the distance between the network output and the corresponding Gaussian approximation scales inversely with the width of the network, exhibiting faster convergence than the naive suggested by the central limit theorem.
We also apply our bounds to obtain theoretical approximations for the exact posterior distribution of the network, when the likelihood is a bounded Lipschitz function of the network output evaluated on a (finite) training set.
arXiv Detail & Related papers (2023-12-18T22:29:40Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - D4FT: A Deep Learning Approach to Kohn-Sham Density Functional Theory [79.50644650795012]
We propose a deep learning approach to solve Kohn-Sham Density Functional Theory (KS-DFT)
We prove that such an approach has the same expressivity as the SCF method, yet reduces the computational complexity.
In addition, we show that our approach enables us to explore more complex neural-based wave functions.
arXiv Detail & Related papers (2023-03-01T10:38:10Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Rate of Convergence of Polynomial Networks to Gaussian Processes [0.0]
We examine one-hidden-layer neural networks with random weights.
For networks with an infinitely activation, we demonstrate that the rate of this convergence in 2-Wasserstein metric is $O(n-frac12)$ where $n$ is the number of hidden neurons.
We improve the known convergence rate for other activations, to power-law in $n$ for ReLU and inverse-square-root up to logarithmic factors for erf.
arXiv Detail & Related papers (2021-11-04T21:58:21Z) - Going Beyond Linear RL: Sample Efficient Neural Function Approximation [76.57464214864756]
We study function approximation with two-layer neural networks.
Our results significantly improve upon what can be attained with linear (or eluder dimension) methods.
arXiv Detail & Related papers (2021-07-14T03:03:56Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - Large-width functional asymptotics for deep Gaussian neural networks [2.7561479348365734]
We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions.
Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
arXiv Detail & Related papers (2021-02-20T10:14:37Z) - Infinitely Wide Tensor Networks as Gaussian Process [1.7894377200944511]
In this paper, we show the equivalence of the infinitely wide Networks and the Gaussian Process.
We implement the Gaussian Process corresponding to the infinite limit tensor networks and plot the sample paths of these models.
arXiv Detail & Related papers (2021-01-07T02:29:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.