Rate of Convergence of Polynomial Networks to Gaussian Processes
- URL: http://arxiv.org/abs/2111.03175v1
- Date: Thu, 4 Nov 2021 21:58:21 GMT
- Title: Rate of Convergence of Polynomial Networks to Gaussian Processes
- Authors: Adam Klukowski
- Abstract summary: We examine one-hidden-layer neural networks with random weights.
For networks with an infinitely activation, we demonstrate that the rate of this convergence in 2-Wasserstein metric is $O(n-frac12)$ where $n$ is the number of hidden neurons.
We improve the known convergence rate for other activations, to power-law in $n$ for ReLU and inverse-square-root up to logarithmic factors for erf.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We examine one-hidden-layer neural networks with random weights. It is
well-known that in the limit of infinitely many neurons they simplify to
Gaussian processes. For networks with a polynomial activation, we demonstrate
that the rate of this convergence in 2-Wasserstein metric is
$O(n^{-\frac{1}{2}})$, where $n$ is the number of hidden neurons. We suspect
this rate is asymptotically sharp. We improve the known convergence rate for
other activations, to power-law in $n$ for ReLU and inverse-square-root up to
logarithmic factors for erf. We explore the interplay between spherical
harmonics, Stein kernels and optimal transport in the non-isotropic setting.
Related papers
- Random ReLU Neural Networks as Non-Gaussian Processes [20.607307985674428]
We show that random neural networks with rectified linear unit activation functions are well-defined non-Gaussian processes.
As a by-product, we demonstrate that these networks are solutions to differential equations driven by impulsive white noise.
arXiv Detail & Related papers (2024-05-16T16:28:11Z) - Over-Parameterization Exponentially Slows Down Gradient Descent for
Learning a Single Neuron [49.45105570960104]
We prove the global convergence of randomly gradient descent with a $Oleft(T-3right)$ rate.
These two bounds jointly give an exact characterization of the convergence rate.
We show this potential function converges slowly, which implies the slow convergence rate of the loss function.
arXiv Detail & Related papers (2023-02-20T15:33:26Z) - Bounding the Width of Neural Networks via Coupled Initialization -- A
Worst Case Analysis [121.9821494461427]
We show how to significantly reduce the number of neurons required for two-layer ReLU networks.
We also prove new lower bounds that improve upon prior work, and that under certain assumptions, are best possible.
arXiv Detail & Related papers (2022-06-26T06:51:31Z) - Correlation Functions in Random Fully Connected Neural Networks at
Finite Width [17.51364577113718]
This article considers fully connected neural networks with Gaussian random weights and biases and $L$ hidden layers.
For bounded non-linearities we give sharp recursion estimates in powers of $1/n$ for the joint correlation functions of the network output and its derivatives.
We find in both cases that the depth-to-width ratio $L/n$ plays the role of an effective network depth, controlling both the scale of fluctuations at individual neurons and the size of inter-neuron correlations.
arXiv Detail & Related papers (2022-04-03T11:57:18Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - The Rate of Convergence of Variation-Constrained Deep Neural Networks [35.393855471751756]
We show that a class of variation-constrained neural networks can achieve near-parametric rate $n-1/2+delta$ for an arbitrarily small constant $delta$.
The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived.
arXiv Detail & Related papers (2021-06-22T21:28:00Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - Large-width functional asymptotics for deep Gaussian neural networks [2.7561479348365734]
We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions.
Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
arXiv Detail & Related papers (2021-02-20T10:14:37Z) - Non-asymptotic approximations of neural networks by Gaussian processes [7.56714041729893]
We study the extent to which wide neural networks may be approximated by Gaussian processes when with random weights.
As the width of a network goes to infinity, its law converges to that of a Gaussian process.
arXiv Detail & Related papers (2021-02-17T10:19:26Z) - Neural Thompson Sampling [94.82847209157494]
We propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both exploration and exploitation.
At the core of our algorithm is a novel posterior distribution of the reward, where its mean is the neural network approximator, and its variance is built upon the neural tangent features of the corresponding neural network.
arXiv Detail & Related papers (2020-10-02T07:44:09Z) - Variational Monte Carlo calculations of $\mathbf{A\leq 4}$ nuclei with
an artificial neural-network correlator ansatz [62.997667081978825]
We introduce a neural-network quantum state ansatz to model the ground-state wave function of light nuclei.
We compute the binding energies and point-nucleon densities of $Aleq 4$ nuclei as emerging from a leading-order pionless effective field theory Hamiltonian.
arXiv Detail & Related papers (2020-07-28T14:52:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.