When does Metropolized Hamiltonian Monte Carlo provably outperform
Metropolis-adjusted Langevin algorithm?
- URL: http://arxiv.org/abs/2304.04724v2
- Date: Thu, 8 Jun 2023 16:26:45 GMT
- Title: When does Metropolized Hamiltonian Monte Carlo provably outperform
Metropolis-adjusted Langevin algorithm?
- Authors: Yuansi Chen and Khashayar Gatmiry
- Abstract summary: We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator.
We show that the joint distribution of the location and velocity variables of the discretization of the continuous HMC dynamics stays approximately invariant.
- Score: 4.657614491309671
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with
the leapfrog integrator to sample from a distribution on $\mathbb{R}^d$ whose
log-density is smooth, has Lipschitz Hessian in Frobenius norm and satisfies
isoperimetry. We bound the gradient complexity to reach $\epsilon$ error in
total variation distance from a warm start by $\tilde
O(d^{1/4}\text{polylog}(1/\epsilon))$ and demonstrate the benefit of choosing
the number of leapfrog steps to be larger than 1. To surpass previous analysis
on Metropolis-adjusted Langevin algorithm (MALA) that has
$\tilde{O}(d^{1/2}\text{polylog}(1/\epsilon))$ dimension dependency in Wu et
al. (2022), we reveal a key feature in our proof that the joint distribution of
the location and velocity variables of the discretization of the continuous HMC
dynamics stays approximately invariant. This key feature, when shown via
induction over the number of leapfrog steps, enables us to obtain estimates on
moments of various quantities that appear in the acceptance rate control of
Metropolized HMC. Moreover, to deal with another bottleneck on the HMC proposal
distribution overlap control in the literature, we provide a new approach to
upper bound the Kullback-Leibler divergence between push-forwards of the
Gaussian distribution through HMC dynamics initialized at two different points.
Notably, our analysis does not require log-concavity or independence of the
marginals, and only relies on an isoperimetric inequality. To illustrate the
applicability of our result, several examples of natural functions that fall
into our framework are discussed.
Related papers
- Faster Sampling via Stochastic Gradient Proximal Sampler [28.422547264326468]
Proximal samplers (SPS) for sampling from non-log-concave distributions are studied.
We show that the convergence to the target distribution can be guaranteed as long as the algorithm trajectory is bounded.
We provide two implementable variants based on Langevin dynamics (SGLD) and Langevin-MALA, giving rise to SPS-SGLD and SPS-MALA.
arXiv Detail & Related papers (2024-05-27T00:53:18Z) - Symmetric Mean-field Langevin Dynamics for Distributional Minimax
Problems [78.96969465641024]
We extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates.
We also study time and particle discretization regimes and prove a new uniform-in-time propagation of chaos result.
arXiv Detail & Related papers (2023-12-02T13:01:29Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Lower Bounds on Metropolized Sampling Methods for Well-Conditioned
Distributions [29.314919044203926]
We give lower bounds on the performance of two of the most popular sampling methods in practice, the Langevin algorithm (MALA) and multi-step Hamiltonian Monte Carlo (HMC)
Our main result is a nearly-tight lower bound of $widetildeOmega(kappa d)$ on the mixing time of MALA from an exponentially warm start.
arXiv Detail & Related papers (2021-06-10T03:47:39Z) - Mixing Time Guarantees for Unadjusted Hamiltonian Monte Carlo [1.14219428942199]
We provide quantitative upper bounds on the total variation mixing time of the Markov chain corresponding to the unadjusted Hamiltonian Monte Carlo (uHMC) algorithm.
For two general classes of models and fixed time discretization step size $h$, the mixing time is shown to depend only logarithmically on the dimension.
We show that an $varepsilon$-accurate approximation of the target distribution $mu$ in total variation distance can be achieved by uHMC.
arXiv Detail & Related papers (2021-05-03T14:13:47Z) - A New Framework for Variance-Reduced Hamiltonian Monte Carlo [88.84622104944503]
We propose a new framework of variance-reduced Hamiltonian Monte Carlo (HMC) methods for sampling from an $L$-smooth and $m$-strongly log-concave distribution.
We show that the unbiased gradient estimators, including SAGA and SVRG, based HMC methods achieve highest gradient efficiency with small batch size.
Experimental results on both synthetic and real-world benchmark data show that our new framework significantly outperforms the full gradient and gradient HMC approaches.
arXiv Detail & Related papers (2021-02-09T02:44:24Z) - Optimal dimension dependence of the Metropolis-Adjusted Langevin
Algorithm [22.19906823739798]
Mixing time of MALA on class of log-smooth and strongly log-concave distributions is $O(d)$.
New technique based on projection characterization of the Metropolis adjustment reduces study of MALA to the well-studied discretization analysis of the Langevin SDE.
arXiv Detail & Related papers (2020-12-23T17:14:06Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Convergence of Langevin Monte Carlo in Chi-Squared and Renyi Divergence [8.873449722727026]
We show that the rate estimate $widetildemathcalO(depsilon-1)$ improves the previously known rates in both of these metrics.
In particular, for convex and firstorder smooth potentials, we show that LMC algorithm achieves the rate estimate $widetildemathcalO(depsilon-1)$ which improves the previously known rates in both of these metrics.
arXiv Detail & Related papers (2020-07-22T18:18:28Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.