Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction
- URL: http://arxiv.org/abs/2306.07221v1
- Date: Mon, 12 Jun 2023 16:28:11 GMT
- Title: Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction
- Authors: Taiji Suzuki and Denny Wu and Atsushi Nitanda
- Abstract summary: The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
- Score: 49.66486092259376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the
Langevin dynamics that incorporates a distribution-dependent drift, and it
naturally arises from the optimization of two-layer neural networks via (noisy)
gradient descent. Recent works have shown that MFLD globally minimizes an
entropy-regularized convex functional in the space of measures. However, all
prior analyses assumed the infinite-particle or continuous-time limit, and
cannot handle stochastic gradient updates. We provide an general framework to
prove a uniform-in-time propagation of chaos for MFLD that takes into account
the errors due to finite-particle approximation, time-discretization, and
stochastic gradient approximation. To demonstrate the wide applicability of
this framework, we establish quantitative convergence rate guarantees to the
regularized global optimal solution under (i) a wide range of learning problems
such as neural network in the mean-field regime and MMD minimization, and (ii)
different gradient estimators including SGD and SVRG. Despite the generality of
our results, we achieve an improved convergence rate in both the SGD and SVRG
settings when specialized to the standard Langevin dynamics.
Related papers
- Improved Particle Approximation Error for Mean Field Neural Networks [9.817855108627452]
Mean-field Langevin dynamics (MFLD) minimizes an entropy-regularized nonlinear convex functional defined over the space of probability distributions.
Recent works have demonstrated the uniform-in-time propagation of chaos for MFLD.
We improve the dependence on logarithmic Sobolev inequality (LSI) constants in their particle approximation errors.
arXiv Detail & Related papers (2024-05-24T17:59:06Z) - Symmetric Mean-field Langevin Dynamics for Distributional Minimax
Problems [78.96969465641024]
We extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates.
We also study time and particle discretization regimes and prove a new uniform-in-time propagation of chaos result.
arXiv Detail & Related papers (2023-12-02T13:01:29Z) - Accelerating Convergence in Global Non-Convex Optimization with
Reversible Diffusion [0.0]
Langevin Dynamics has been extensively in global non- optimization experiments.
Our proposed method is used to investigate the trade-off between speed and discretization error.
arXiv Detail & Related papers (2023-05-19T07:49:40Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented.
$p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z) - Hessian-Free High-Resolution Nesterov Acceleration for Sampling [55.498092486970364]
Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed.
This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.
arXiv Detail & Related papers (2020-06-16T15:07:37Z) - Dynamical mean-field theory for stochastic gradient descent in Gaussian
mixture classification [25.898873960635534]
We analyze in a closed learning dynamics of gradient descent (SGD) for a single-layer neural network classifying a high-dimensional landscape.
We define a prototype process for which can be extended to a continuous-dimensional gradient flow.
In the full-batch limit, we recover the standard gradient flow.
arXiv Detail & Related papers (2020-06-10T22:49:41Z) - Non-Convex Optimization via Non-Reversible Stochastic Gradient Langevin
Dynamics [27.097121544378528]
Gradient Langevin Dynamics (SGLD) is a powerful algorithm for optimizing a non- objective gradient.
NSGLD is based on discretization of the non-reversible diffusion.
arXiv Detail & Related papers (2020-04-06T17:11:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.