Related papers: Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for Log-Concave Sampling

Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for Log-Concave Sampling

URL: http://arxiv.org/abs/2109.13055v1
Date: Mon, 27 Sep 2021 14:02:27 GMT
Title: Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for Log-Concave Sampling
Authors: Keru Wu, Scott Schmidler, Yuansi Chen
Abstract summary: We study the mixing time of the Metropolis-adjusted Langevin algorithm (MALA) for sampling from a log-smooth and strongly log-concave distribution. We establish its optimal minimax mixing time under a warm start.
Score: 2.9972063833424216
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the mixing time of the Metropolis-adjusted Langevin algorithm (MALA) for sampling from a log-smooth and strongly log-concave distribution. We establish its optimal minimax mixing time under a warm start. Our main contribution is two-fold. First, for a $d$-dimensional log-concave density with condition number $\kappa$, we show that MALA with a warm start mixes in $\tilde O(\kappa \sqrt{d})$ iterations up to logarithmic factors. This improves upon the previous work on the dependency of either the condition number $\kappa$ or the dimension $d$. Our proof relies on comparing the leapfrog integrator with the continuous Hamiltonian dynamics, where we establish a new concentration bound for the acceptance rate. Second, we prove a spectral gap based mixing time lower bound for reversible MCMC algorithms on general state spaces. We apply this lower bound result to construct a hard distribution for which MALA requires at least $\tilde \Omega (\kappa \sqrt{d})$ steps to mix. The lower bound for MALA matches our upper bound in terms of condition number and dimension. Finally, numerical experiments are included to validate our theoretical results.

Related papers

Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems [78.96969465641024]
We extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates. We also study time and particle discretization regimes and prove a new uniform-in-time propagation of chaos result.
arXiv Detail & Related papers (2023-12-02T13:01:29Z)
When does Metropolized Hamiltonian Monte Carlo provably outperform Metropolis-adjusted Langevin algorithm? [4.657614491309671]
We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator. We show that the joint distribution of the location and velocity variables of the discretization of the continuous HMC dynamics stays approximately invariant.
arXiv Detail & Related papers (2023-04-10T17:35:57Z)
A Simple Proof of the Mixing of Metropolis-Adjusted Langevin Algorithm under Smoothness and Isoperimetry [4.657614491309671]
MALA mixes in $Oleft(frac(LUpsilon)frac12psi_mu2 logleft(frac1epsilonright)right logleft(frac1epsilonright)right. We find that MALA mixes in $Oleft(frac(LUpsilon)frac12psi_mu2 logleft(frac1epsilonright)right.
arXiv Detail & Related papers (2023-04-08T20:17:29Z)
Faster high-accuracy log-concave sampling via algorithmic warm starts [6.117084972237769]
In practice, high-accuracy samplers such as the classical Metropolis-adjusted Langevin algorithm (MALA) remain the de facto gold standard. We improve the dimension dependence of this sampling problem to $tildeO(d1/2)$, whereas the previous best result for MALA was $tildeO(d)$. Our main technical contribution settles this problem by establishing the first $tildeO(d1/2)$ R'enyi mixing rates for the discretized underdamped Langevin diffusion.
arXiv Detail & Related papers (2023-02-20T19:27:21Z)
Mean-Square Analysis with An Application to Optimal Dimension Dependence of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance. Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z)
Provably Breaking the Quadratic Error Compounding Barrier in Imitation Learning, Optimally [58.463668865380946]
We study the statistical limits of Imitation Learning in episodic Markov Decision Processes (MDPs) with a state space $mathcalS$. We establish an upper bound $O(|mathcalS|H3/2/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al ( 2020) We show the minimax suboptimality grows as $Omega( H3/2/N)$ when $mathcalS|geq 3$ while the unknown-transition setting suffers from a larger sharp rate
arXiv Detail & Related papers (2021-02-25T15:50:19Z)
A New Framework for Variance-Reduced Hamiltonian Monte Carlo [88.84622104944503]
We propose a new framework of variance-reduced Hamiltonian Monte Carlo (HMC) methods for sampling from an $L$-smooth and $m$-strongly log-concave distribution. We show that the unbiased gradient estimators, including SAGA and SVRG, based HMC methods achieve highest gradient efficiency with small batch size. Experimental results on both synthetic and real-world benchmark data show that our new framework significantly outperforms the full gradient and gradient HMC approaches.
arXiv Detail & Related papers (2021-02-09T02:44:24Z)
Optimal dimension dependence of the Metropolis-Adjusted Langevin Algorithm [22.19906823739798]
Mixing time of MALA on class of log-smooth and strongly log-concave distributions is $O(d)$. New technique based on projection characterization of the Metropolis adjustment reduces study of MALA to the well-studied discretization analysis of the Langevin SDE.
arXiv Detail & Related papers (2020-12-23T17:14:06Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model. We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs. To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP) We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
Logsmooth Gradient Concentration and Tighter Runtimes for Metropolized Hamiltonian Monte Carlo [23.781520510778716]
This is the first high-accuracy mixing time result for logconcave distributions using only first-order function information. We give evidence that dependence on $kappa$ is likely to be necessary standard Metropolized firstorder methods.
arXiv Detail & Related papers (2020-02-10T22:44:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.