Logsmooth Gradient Concentration and Tighter Runtimes for Metropolized
Hamiltonian Monte Carlo
- URL: http://arxiv.org/abs/2002.04121v3
- Date: Sun, 14 Jun 2020 02:12:45 GMT
- Title: Logsmooth Gradient Concentration and Tighter Runtimes for Metropolized
Hamiltonian Monte Carlo
- Authors: Yin Tat Lee, Ruoqi Shen, Kevin Tian
- Abstract summary: This is the first high-accuracy mixing time result for logconcave distributions using only first-order function information.
We give evidence that dependence on $kappa$ is likely to be necessary standard Metropolized firstorder methods.
- Score: 23.781520510778716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that the gradient norm $\|\nabla f(x)\|$ for $x \sim \exp(-f(x))$,
where $f$ is strongly convex and smooth, concentrates tightly around its mean.
This removes a barrier in the prior state-of-the-art analysis for the
well-studied Metropolized Hamiltonian Monte Carlo (HMC) algorithm for sampling
from a strongly logconcave distribution. We correspondingly demonstrate that
Metropolized HMC mixes in $\tilde{O}(\kappa d)$ iterations, improving upon the
$\tilde{O}(\kappa^{1.5}\sqrt{d} + \kappa d)$ runtime of (Dwivedi et. al. '18,
Chen et. al. '19) by a factor $(\kappa/d)^{1/2}$ when the condition number
$\kappa$ is large. Our mixing time analysis introduces several techniques which
to our knowledge have not appeared in the literature and may be of independent
interest, including restrictions to a nonconvex set with good conductance
behavior, and a new reduction technique for boosting a constant-accuracy total
variation guarantee under weak warmness assumptions. This is the first
high-accuracy mixing time result for logconcave distributions using only
first-order function information which achieves linear dependence on $\kappa$;
we also give evidence that this dependence is likely to be necessary for
standard Metropolized first-order methods.
Related papers
- Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling [28.931489333515618]
We establish an oracle complexity of $widetildeOleft(fracdbeta2cal A2varepsilon6right)$ for simple annealed Langevin Monte Carlo algorithm.
We show that $cal A$ represents the action of a curve of probability measures interpolating the target distribution $pi$ and a readily sampleable distribution.
arXiv Detail & Related papers (2024-07-24T02:15:48Z) - Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit
Feedback and Unknown Transition [71.33787410075577]
We study reinforcement learning with linear function approximation, unknown transition, and adversarial losses.
We propose a new algorithm that attains an $widetildeO(dsqrtHS3K + sqrtHSAK)$ regret with high probability.
arXiv Detail & Related papers (2024-03-07T15:03:50Z) - On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - When does Metropolized Hamiltonian Monte Carlo provably outperform
Metropolis-adjusted Langevin algorithm? [4.657614491309671]
We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator.
We show that the joint distribution of the location and velocity variables of the discretization of the continuous HMC dynamics stays approximately invariant.
arXiv Detail & Related papers (2023-04-10T17:35:57Z) - Hamiltonian Monte Carlo for efficient Gaussian sampling: long and random
steps [0.0]
Hamiltonian Monte Carlo (HMC) is a Markov chain algorithm for sampling from a high-dimensional distribution with density $e-f(x)$.
We show that HMC can sample from a distribution that is $varepsilon$-close in total variation distance using $widetildeO(sqrtkappa d1/4 log(1/varepsilon)$ gradient queries.
arXiv Detail & Related papers (2022-09-26T15:29:29Z) - Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for
Log-Concave Sampling [2.9972063833424216]
We study the mixing time of the Metropolis-adjusted Langevin algorithm (MALA) for sampling from a log-smooth and strongly log-concave distribution.
We establish its optimal minimax mixing time under a warm start.
arXiv Detail & Related papers (2021-09-27T14:02:27Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - A New Framework for Variance-Reduced Hamiltonian Monte Carlo [88.84622104944503]
We propose a new framework of variance-reduced Hamiltonian Monte Carlo (HMC) methods for sampling from an $L$-smooth and $m$-strongly log-concave distribution.
We show that the unbiased gradient estimators, including SAGA and SVRG, based HMC methods achieve highest gradient efficiency with small batch size.
Experimental results on both synthetic and real-world benchmark data show that our new framework significantly outperforms the full gradient and gradient HMC approaches.
arXiv Detail & Related papers (2021-02-09T02:44:24Z) - Convergence of Langevin Monte Carlo in Chi-Squared and Renyi Divergence [8.873449722727026]
We show that the rate estimate $widetildemathcalO(depsilon-1)$ improves the previously known rates in both of these metrics.
In particular, for convex and firstorder smooth potentials, we show that LMC algorithm achieves the rate estimate $widetildemathcalO(depsilon-1)$ which improves the previously known rates in both of these metrics.
arXiv Detail & Related papers (2020-07-22T18:18:28Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.