Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling
- URL: http://arxiv.org/abs/2010.09597v2
- Date: Tue, 23 Feb 2021 07:15:34 GMT
- Title: Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling
- Authors: Difan Zou and Pan Xu and Quanquan Gu
- Abstract summary: We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
- Score: 110.88857917726276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide a new convergence analysis of stochastic gradient Langevin
dynamics (SGLD) for sampling from a class of distributions that can be
non-log-concave. At the core of our approach is a novel conductance analysis of
SGLD using an auxiliary time-reversible Markov Chain. Under certain conditions
on the target distribution, we prove that $\tilde O(d^4\epsilon^{-2})$
stochastic gradient evaluations suffice to guarantee $\epsilon$-sampling error
in terms of the total variation distance, where $d$ is the problem dimension.
This improves existing results on the convergence rate of SGLD (Raginsky et
al., 2017; Xu et al., 2018). We further show that provided an additional
Hessian Lipschitz condition on the log-density function, SGLD is guaranteed to
achieve $\epsilon$-sampling error within $\tilde O(d^{15/4}\epsilon^{-3/2})$
stochastic gradient evaluations. Our proof technique provides a new way to
study the convergence of Langevin-based algorithms and sheds some light on the
design of fast stochastic gradient-based sampling algorithms.
Related papers
- Faster Sampling via Stochastic Gradient Proximal Sampler [28.422547264326468]
Proximal samplers (SPS) for sampling from non-log-concave distributions are studied.
We show that the convergence to the target distribution can be guaranteed as long as the algorithm trajectory is bounded.
We provide two implementable variants based on Langevin dynamics (SGLD) and Langevin-MALA, giving rise to SPS-SGLD and SPS-MALA.
arXiv Detail & Related papers (2024-05-27T00:53:18Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - Demystifying the Myths and Legends of Nonconvex Convergence of SGD [17.445810977264067]
gradient descent (SGD) and its variants are the main workhorses for solving large-scale optimization problems.
As our analyses, we addressed certain myths and legends related to the non convergence of the gradient.
arXiv Detail & Related papers (2023-10-19T17:58:59Z) - Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions.
We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z) - A sharp uniform-in-time error estimate for Stochastic Gradient Langevin
Dynamics [11.437892757715877]
We establish a sharp uniform-in-time error estimate for the Gradient Langevin Dynamics (SGLD)
We obtain a uniform-in-time $O(eta2)$ bound for the KL-divergence between the SGLD iteration and the Langevin diffusion.
arXiv Detail & Related papers (2022-07-19T14:38:52Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Wasserstein distance estimates for the distributions of numerical
approximations to ergodic stochastic differential equations [0.3553493344868413]
We study the Wasserstein distance between the in distribution of an ergodic differential equation and the distribution in the strongly log-concave case.
This allows us to study in a unified way a number of different approximations proposed in the literature for the overdamped and underdamped Langevin dynamics.
arXiv Detail & Related papers (2021-04-26T07:50:04Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.