Conservative SPDEs as fluctuating mean field limits of stochastic
gradient descent
- URL: http://arxiv.org/abs/2207.05705v1
- Date: Tue, 12 Jul 2022 17:27:18 GMT
- Title: Conservative SPDEs as fluctuating mean field limits of stochastic
gradient descent
- Authors: Benjamin Gess, Rishabh S. Gvalani, Vitalii Konarovskyi
- Abstract summary: It is shown that the inclusion of fluctuations in the limiting SPDE improves the rate of convergence, and retains information about the fluctuations of descent in the continuum limit.
- Score: 1.2031796234206138
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The convergence of stochastic interacting particle systems in the mean-field
limit to solutions to conservative stochastic partial differential equations is
shown, with optimal rate of convergence. As a second main result, a
quantitative central limit theorem for such SPDEs is derived, again with
optimal rate of convergence.
The results apply in particular to the convergence in the mean-field scaling
of stochastic gradient descent dynamics in overparametrized, shallow neural
networks to solutions to SPDEs. It is shown that the inclusion of fluctuations
in the limiting SPDE improves the rate of convergence, and retains information
about the fluctuations of stochastic gradient descent in the continuum limit.
Related papers
- Convergence and concentration properties of constant step-size SGD
through Markov chains [0.0]
We consider the optimization of a smooth and strongly convex objective using constant step-size gradient descent (SGD)
We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance.
All our results are non-asymptotic and their consequences are discussed through a few applications.
arXiv Detail & Related papers (2023-06-20T12:36:28Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - Exponential convergence rates for momentum stochastic gradient descent in the overparametrized setting [0.6445605125467574]
We prove bounds on the rate of convergence for the momentum gradient descent scheme (MSGD)
We analyze the optimal choice of the friction and show that the MSGD process almost surely converges to a local.
arXiv Detail & Related papers (2023-02-07T15:59:08Z) - Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented.
$p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z) - Convergence of policy gradient for entropy regularized MDPs with neural
network approximation in the mean-field regime [0.0]
We study the global convergence of policy gradient for infinite-horizon, continuous state and action space, entropy-regularized Markov decision processes (MDPs)
Our results rely on the careful analysis of non-linear Fokker--Planck--Kolmogorov equation.
arXiv Detail & Related papers (2022-01-18T20:17:16Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - A Dynamical Central Limit Theorem for Shallow Neural Networks [48.66103132697071]
We prove that the fluctuations around the mean limit remain bounded in mean square throughout training.
If the mean-field dynamics converges to a measure that interpolates the training data, we prove that the deviation eventually vanishes in the CLT scaling.
arXiv Detail & Related papers (2020-08-21T18:00:50Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z) - The Convergence Indicator: Improved and completely characterized
parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge.
Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.