Correlation Functions in Random Fully Connected Neural Networks at
Finite Width
- URL: http://arxiv.org/abs/2204.01058v1
- Date: Sun, 3 Apr 2022 11:57:18 GMT
- Title: Correlation Functions in Random Fully Connected Neural Networks at
Finite Width
- Authors: Boris Hanin
- Abstract summary: This article considers fully connected neural networks with Gaussian random weights and biases and $L$ hidden layers.
For bounded non-linearities we give sharp recursion estimates in powers of $1/n$ for the joint correlation functions of the network output and its derivatives.
We find in both cases that the depth-to-width ratio $L/n$ plays the role of an effective network depth, controlling both the scale of fluctuations at individual neurons and the size of inter-neuron correlations.
- Score: 17.51364577113718
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This article considers fully connected neural networks with Gaussian random
weights and biases and $L$ hidden layers, each of width proportional to a large
parameter $n$. For polynomially bounded non-linearities we give sharp estimates
in powers of $1/n$ for the joint correlation functions of the network output
and its derivatives. Moreover, we obtain exact layerwise recursions for these
correlation functions and solve a number of special cases for classes of
non-linearities including $\mathrm{ReLU}$ and $\tanh$. We find in both cases
that the depth-to-width ratio $L/n$ plays the role of an effective network
depth, controlling both the scale of fluctuations at individual neurons and the
size of inter-neuron correlations. We use this to study a somewhat simplified
version of the so-called exploding and vanishing gradient problem, proving that
this particular variant occurs if and only if $L/n$ is large. Several of the
key ideas in this article were first developed at a physics level of rigor in a
recent monograph with Roberts and Yaida.
Related papers
- Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Random features and polynomial rules [0.0]
We present a generalization of the performance of random features models for generic supervised learning problems with Gaussian data.
We find good agreement far from the limits where $Dto infty$ and at least one between $P/DK$, $N/DL$ remains finite.
arXiv Detail & Related papers (2024-02-15T18:09:41Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Theory of Deep Convolutional Neural Networks III: Approximating Radial
Functions [7.943024117353317]
We consider a family of deep neural networks consisting of two groups of convolutional layers, a down operator, and a fully connected layer.
The network structure depends on two structural parameters which determine the numbers of convolutional layers and the width of the fully connected layer.
arXiv Detail & Related papers (2021-07-02T08:22:12Z) - The Rate of Convergence of Variation-Constrained Deep Neural Networks [35.393855471751756]
We show that a class of variation-constrained neural networks can achieve near-parametric rate $n-1/2+delta$ for an arbitrarily small constant $delta$.
The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived.
arXiv Detail & Related papers (2021-06-22T21:28:00Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - Deep Networks and the Multiple Manifold Problem [15.144495799445824]
We study the multiple manifold problem, a binary classification task modeled on applications in machine vision, in which a deep fully-connected neural network is trained to separate two low-dimensional submanifolds of the unit sphere.
We prove for a simple manifold configuration that when the network depth $L$ is large relative to certain geometric and statistical properties of the data, the network width grows as a sufficiently large in $L$.
Our analysis demonstrates concrete benefits of depth and width in the context of a practically-motivated model problem.
arXiv Detail & Related papers (2020-08-25T19:20:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.