A Simple Proof of the Mixing of Metropolis-Adjusted Langevin Algorithm
under Smoothness and Isoperimetry
- URL: http://arxiv.org/abs/2304.04095v2
- Date: Thu, 8 Jun 2023 16:31:07 GMT
- Title: A Simple Proof of the Mixing of Metropolis-Adjusted Langevin Algorithm
under Smoothness and Isoperimetry
- Authors: Yuansi Chen and Khashayar Gatmiry
- Abstract summary: MALA mixes in $Oleft(frac(LUpsilon)frac12psi_mu2 logleft(frac1epsilonright)right logleft(frac1epsilonright)right.
We find that MALA mixes in $Oleft(frac(LUpsilon)frac12psi_mu2 logleft(frac1epsilonright)right.
- Score: 4.657614491309671
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the mixing time of Metropolis-Adjusted Langevin algorithm (MALA) for
sampling a target density on $\mathbb{R}^d$. We assume that the target density
satisfies $\psi_\mu$-isoperimetry and that the operator norm and trace of its
Hessian are bounded by $L$ and $\Upsilon$ respectively. Our main result
establishes that, from a warm start, to achieve $\epsilon$-total variation
distance to the target density, MALA mixes in
$O\left(\frac{(L\Upsilon)^{\frac12}}{\psi_\mu^2}
\log\left(\frac{1}{\epsilon}\right)\right)$ iterations. Notably, this result
holds beyond the log-concave sampling setting and the mixing time depends on
only $\Upsilon$ rather than its upper bound $L d$. In the $m$-strongly
logconcave and $L$-log-smooth sampling setting, our bound recovers the previous
minimax mixing bound of MALA~\cite{wu2021minimax}.
Related papers
- Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - Unadjusted Hamiltonian MCMC with Stratified Monte Carlo Time Integration [0.0]
A randomized time integrator is suggested for unadjusted Hamiltonian Monte Carlo (uHMC)
$adjustable randomized time are also provided.
The complexity of the uHMC algorithm with Verlet time integration is in general $Oleft((d/K)1/2 (L/K)2 varepsilon-1 log.
arXiv Detail & Related papers (2022-11-20T15:45:26Z) - A Fourier Approach to Mixture Learning [46.995354373649675]
We give an algorithm that efficiently learns the means in $d = O(log k/loglog k)$ dimensions under separation $d/sqrtlog k (modulo factors)
Our results can be easily extended to learning mixtures of non-Gaussian distributions, under a mild condition on the Fourier spectrum of the distribution.
arXiv Detail & Related papers (2022-10-05T17:35:46Z) - Sampling from Log-Concave Distributions with Infinity-Distance
Guarantees and Applications to Differentially Private Optimization [33.38289436686841]
We present an algorithm that outputs a point from a distributionO(varepsilon)$close to $$ in infinity-distance.
We also present a "soft-pi" version of the Dikin walk which may be independent interest.
arXiv Detail & Related papers (2021-11-07T13:44:50Z) - Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for
Log-Concave Sampling [2.9972063833424216]
We study the mixing time of the Metropolis-adjusted Langevin algorithm (MALA) for sampling from a log-smooth and strongly log-concave distribution.
We establish its optimal minimax mixing time under a warm start.
arXiv Detail & Related papers (2021-09-27T14:02:27Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z) - SDP Achieves Exact Minimax Optimality in Phase Synchronization [19.909352968029584]
We study the phase synchronization problem with noisy measurements $Y=z*z*+sigma WinmathbbCntimes ntimes n.
We prove that SDP achieves error bound $ (1+o)fracnp22np$ under a squared $ell$ loss.
arXiv Detail & Related papers (2021-01-07T03:14:05Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.