Accelerating Hamiltonian Monte Carlo via Chebyshev Integration Time
- URL: http://arxiv.org/abs/2207.02189v1
- Date: Tue, 5 Jul 2022 17:42:22 GMT
- Title: Accelerating Hamiltonian Monte Carlo via Chebyshev Integration Time
- Authors: Jun-Kun Wang and Andre Wibisono
- Abstract summary: Hamiltonian Monte Carlo (HMC) is a popular method in sampling.
We propose a scheme of time-varying integration time based on the roots of Chebyshevs.
- Score: 13.427128424538502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hamiltonian Monte Carlo (HMC) is a popular method in sampling. While there
are quite a few works of studying this method on various aspects, an
interesting question is how to choose its integration time to achieve
acceleration. In this work, we consider accelerating the process of sampling
from a distribution $\pi(x) \propto \exp(-f(x))$ via HMC via time-varying
integration time. When the potential $f$ is $L$-smooth and $m$-strongly convex,
i.e.\ for sampling from a log-smooth and strongly log-concave target
distribution $\pi$, it is known that under a constant integration time, the
number of iterations that ideal HMC takes to get an $\epsilon$ Wasserstein-2
distance to the target $\pi$ is $O( \kappa \log \frac{1}{\epsilon} )$, where
$\kappa := \frac{L}{m}$ is the condition number. We propose a scheme of
time-varying integration time based on the roots of Chebyshev polynomials. We
show that in the case of quadratic potential $f$, i.e., when the target $\pi$
is a Gaussian distribution, ideal HMC with this choice of integration time only
takes $O( \sqrt{\kappa} \log \frac{1}{\epsilon} )$ number of iterations to
reach Wasserstein-2 distance less than $\epsilon$; this improvement on the
dependence on condition number is akin to acceleration in optimization. The
design and analysis of HMC with the proposed integration time is built on the
tools of Chebyshev polynomials. Experiments find the advantage of adopting our
scheme of time-varying integration time even for sampling from distributions
with smooth strongly convex potentials that are not quadratic.
Related papers
- A shortcut to an optimal quantum linear system solver [55.2480439325792]
We give a conceptually simple quantum linear system solvers (QLSS) that does not use complex or difficult-to-analyze techniques.
If the solution norm $lVertboldsymbolxrVert$ is known exactly, our QLSS requires only a single application of kernel.
Alternatively, by reintroducing a concept from the adiabatic path-following technique, we show that $O(kappa)$ complexity can be achieved for norm estimation.
arXiv Detail & Related papers (2024-06-17T20:54:11Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Efficient Sampling on Riemannian Manifolds via Langevin MCMC [51.825900634131486]
We study the task efficiently sampling from a Gibbs distribution $d pi* = eh d vol_g$ over aian manifold $M$ via (geometric) Langevin MCMC.
Our results apply to general settings where $pi*$ can be non exponential and $Mh$ can have negative Ricci curvature.
arXiv Detail & Related papers (2024-02-15T22:59:14Z) - Efficient Sampling of Stochastic Differential Equations with Positive
Semi-Definite Models [91.22420505636006]
This paper deals with the problem of efficient sampling from a differential equation, given the drift function and the diffusion matrix.
It is possible to obtain independent and identically distributed (i.i.d.) samples at precision $varepsilon$ with a cost that is $m2 d log (1/varepsilon)$
Our results suggest that as the true solution gets smoother, we can circumvent the curse of dimensionality without requiring any sort of convexity.
arXiv Detail & Related papers (2023-03-30T02:50:49Z) - Stochastic Langevin Monte Carlo for (weakly) log-concave posterior
distributions [0.0]
We investigate a continuous time version of the Langevin Monte Carlo method, introduced in [WT11], that incorporates a sampling step inside the traditional over-damped Langevin diffusion.
This method is popular in machine learning for sampling posterior distribution.
arXiv Detail & Related papers (2023-01-08T17:08:21Z) - Unadjusted Hamiltonian MCMC with Stratified Monte Carlo Time Integration [0.0]
A randomized time integrator is suggested for unadjusted Hamiltonian Monte Carlo (uHMC)
$adjustable randomized time are also provided.
The complexity of the uHMC algorithm with Verlet time integration is in general $Oleft((d/K)1/2 (L/K)2 varepsilon-1 log.
arXiv Detail & Related papers (2022-11-20T15:45:26Z) - Reward-Mixing MDPs with a Few Latent Contexts are Learnable [75.17357040707347]
We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs)
Our goal is to learn a near-optimal policy that nearly maximizes the $H$ time-step cumulative rewards in such a model.
arXiv Detail & Related papers (2022-10-05T22:52:00Z) - Hamiltonian Monte Carlo for efficient Gaussian sampling: long and random
steps [0.0]
Hamiltonian Monte Carlo (HMC) is a Markov chain algorithm for sampling from a high-dimensional distribution with density $e-f(x)$.
We show that HMC can sample from a distribution that is $varepsilon$-close in total variation distance using $widetildeO(sqrtkappa d1/4 log(1/varepsilon)$ gradient queries.
arXiv Detail & Related papers (2022-09-26T15:29:29Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Logsmooth Gradient Concentration and Tighter Runtimes for Metropolized
Hamiltonian Monte Carlo [23.781520510778716]
This is the first high-accuracy mixing time result for logconcave distributions using only first-order function information.
We give evidence that dependence on $kappa$ is likely to be necessary standard Metropolized firstorder methods.
arXiv Detail & Related papers (2020-02-10T22:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.