A Dynamical Central Limit Theorem for Shallow Neural Networks
- URL: http://arxiv.org/abs/2008.09623v3
- Date: Sat, 26 Mar 2022 10:37:42 GMT
- Title: A Dynamical Central Limit Theorem for Shallow Neural Networks
- Authors: Zhengdao Chen, Grant M. Rotskoff, Joan Bruna, Eric Vanden-Eijnden
- Abstract summary: We prove that the fluctuations around the mean limit remain bounded in mean square throughout training.
If the mean-field dynamics converges to a measure that interpolates the training data, we prove that the deviation eventually vanishes in the CLT scaling.
- Score: 48.66103132697071
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent theoretical works have characterized the dynamics of wide shallow
neural networks trained via gradient descent in an asymptotic mean-field limit
when the width tends towards infinity. At initialization, the random sampling
of the parameters leads to deviations from the mean-field limit dictated by the
classical Central Limit Theorem (CLT). However, since gradient descent induces
correlations among the parameters, it is of interest to analyze how these
fluctuations evolve. Here, we use a dynamical CLT to prove that the asymptotic
fluctuations around the mean limit remain bounded in mean square throughout
training. The upper bound is given by a Monte-Carlo resampling error, with a
variance that that depends on the 2-norm of the underlying measure, which also
controls the generalization error. This motivates the use of this 2-norm as a
regularization term during training. Furthermore, if the mean-field dynamics
converges to a measure that interpolates the training data, we prove that the
asymptotic deviation eventually vanishes in the CLT scaling. We also complement
these results with numerical experiments.
Related papers
- Approximation Results for Gradient Descent trained Neural Networks [0.0]
The networks are fully connected constant depth increasing width.
The continuous kernel error norm implies an approximation under the natural smoothness assumption required for smooth functions.
arXiv Detail & Related papers (2023-09-09T18:47:55Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - Learning Discretized Neural Networks under Ricci Flow [51.36292559262042]
We study Discretized Neural Networks (DNNs) composed of low-precision weights and activations.
DNNs suffer from either infinite or zero gradients due to the non-differentiable discrete function during training.
arXiv Detail & Related papers (2023-02-07T10:51:53Z) - A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks [49.870593940818715]
We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed.
Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
arXiv Detail & Related papers (2022-10-28T17:26:27Z) - High-dimensional limit theorems for SGD: Effective dynamics and critical
scaling [6.950316788263433]
We prove limit theorems for the trajectories of summary statistics of gradient descent (SGD)
We show a critical scaling regime for the step-size, below which the effective ballistic dynamics matches gradient flow for the population loss.
About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate.
arXiv Detail & Related papers (2022-06-08T17:42:18Z) - Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented.
$p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z) - The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
and Anomalous Diffusion [29.489737359897312]
We study the limiting dynamics of deep neural networks trained with gradient descent (SGD)
We show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity and probability currents, which cause oscillations in phase space.
arXiv Detail & Related papers (2021-07-19T20:18:57Z) - Sharp Lower Bounds on the Approximation Rate of Shallow Neural Networks [0.0]
We prove sharp lower bounds on the approximation rates for shallow neural networks.
These lower bounds apply to both sigmoidal activation functions with bounded variation and to activation functions which are a power of the ReLU.
arXiv Detail & Related papers (2021-06-28T22:01:42Z) - Global Convergence of Second-order Dynamics in Two-layer Neural Networks [10.415177082023389]
Recent results have shown that for two-layer fully connected neural networks, gradient flow converges to a global optimum in the infinite width limit.
We show that the answer is positive for the heavy ball method.
While our results are functional in the mean field limit, numerical simulations indicate that global convergence may already occur for reasonably small networks.
arXiv Detail & Related papers (2020-07-14T07:01:57Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.