Unadjusted Hamiltonian MCMC with Stratified Monte Carlo Time Integration
- URL: http://arxiv.org/abs/2211.11003v1
- Date: Sun, 20 Nov 2022 15:45:26 GMT
- Title: Unadjusted Hamiltonian MCMC with Stratified Monte Carlo Time Integration
- Authors: Nawaf Bou-Rabee, Milo Marsden
- Abstract summary: A novel unadjusted Hamiltonian Monte Carlo (uHMC) algorithm is suggested.
It uses a stratified Monte Carlo (SMC) time integrator for the underlying Hamiltonian dynamics.
The complexity of the uHMC algorithm with Verlet time integration is in general $Oleft((d/K)1/2 (L/K)2 varepsilon-1 log( boldsymbolmathcalW2(mu, nu) / varepsilon-1 log( boldsymbolmathcalW
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A novel unadjusted Hamiltonian Monte Carlo (uHMC) algorithm is suggested that
uses a stratified Monte Carlo (SMC) time integrator for the underlying
Hamiltonian dynamics in place of the usual Verlet time integrator. For target
distributions of the form $\mu(dx) \propto e^{-U(x)} dx$ where $U: \mathbb{R}^d
\to \mathbb{R}_{\ge 0}$ is both $K$-strongly convex and $L$-gradient Lipschitz,
and initial distributions $\nu$ with finite second moment, coupling proofs
reveal that an $\varepsilon$-accurate approximation of the target distribution
$\mu$ in $L^2$-Wasserstein distance $\boldsymbol{\mathcal{W}}^2$ can be
achieved by the uHMC algorithm with SMC time integration using
$O\left((d/K)^{1/3} (L/K)^{5/3} \varepsilon^{-2/3} \log(
\boldsymbol{\mathcal{W}}^2(\mu, \nu) / \varepsilon)^+\right)$ gradient
evaluations; whereas without any additional assumptions the corresponding
complexity of the uHMC algorithm with Verlet time integration is in general
$O\left((d/K)^{1/2} (L/K)^2 \varepsilon^{-1} \log(
\boldsymbol{\mathcal{W}}^2(\mu, \nu) / \varepsilon)^+ \right)$. The SMC time
integrator involves a minor modification to Verlet, and hence, is easy to
implement.
Related papers
- Efficient Sampling on Riemannian Manifolds via Langevin MCMC [51.825900634131486]
We study the task efficiently sampling from a Gibbs distribution $d pi* = eh d vol_g$ over aian manifold $M$ via (geometric) Langevin MCMC.
Our results apply to general settings where $pi*$ can be non exponential and $Mh$ can have negative Ricci curvature.
arXiv Detail & Related papers (2024-02-15T22:59:14Z) - $\ell_p$-Regression in the Arbitrary Partition Model of Communication [59.89387020011663]
We consider the randomized communication complexity of the distributed $ell_p$-regression problem in the coordinator model.
For $p = 2$, i.e., least squares regression, we give the first optimal bound of $tildeTheta(sd2 + sd/epsilon)$ bits.
For $p in (1,2)$,we obtain an $tildeO(sd2/epsilon + sd/mathrmpoly(epsilon)$ upper bound.
arXiv Detail & Related papers (2023-07-11T08:51:53Z) - Hamiltonian Monte Carlo for efficient Gaussian sampling: long and random
steps [0.0]
Hamiltonian Monte Carlo (HMC) is a Markov chain algorithm for sampling from a high-dimensional distribution with density $e-f(x)$.
We show that HMC can sample from a distribution that is $varepsilon$-close in total variation distance using $widetildeO(sqrtkappa d1/4 log(1/varepsilon)$ gradient queries.
arXiv Detail & Related papers (2022-09-26T15:29:29Z) - Accelerating Hamiltonian Monte Carlo via Chebyshev Integration Time [13.427128424538502]
Hamiltonian Monte Carlo (HMC) is a popular method in sampling.
We propose a scheme of time-varying integration time based on the roots of Chebyshevs.
arXiv Detail & Related papers (2022-07-05T17:42:22Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z) - Kernel Thinning [26.25415159542831]
kernel thinning is a new procedure for compressing a distribution $mathbbP$ more effectively than i.i.d. sampling or standard thinning.
We derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat'ern, and B-spline kernels.
arXiv Detail & Related papers (2021-05-12T17:56:42Z) - Mixing Time Guarantees for Unadjusted Hamiltonian Monte Carlo [1.14219428942199]
We provide quantitative upper bounds on the total variation mixing time of the Markov chain corresponding to the unadjusted Hamiltonian Monte Carlo (uHMC) algorithm.
For two general classes of models and fixed time discretization step size $h$, the mixing time is shown to depend only logarithmically on the dimension.
We show that an $varepsilon$-accurate approximation of the target distribution $mu$ in total variation distance can be achieved by uHMC.
arXiv Detail & Related papers (2021-05-03T14:13:47Z) - The planted matching problem: Sharp threshold and infinite-order phase
transition [25.41713098167692]
We study the problem of reconstructing a perfect matching $M*$ hidden in a randomly weighted $ntimes n$ bipartite graph.
We show that if $sqrtd B(mathcalP,mathcalQ) ge 1+epsilon$ for an arbitrarily small constant $epsilon>0$, the reconstruction error for any estimator is shown to be bounded away from $0$.
arXiv Detail & Related papers (2021-03-17T00:59:33Z) - Fixed-Support Wasserstein Barycenters: Computational Hardness and Fast
Algorithm [100.11971836788437]
We study the fixed-support Wasserstein barycenter problem (FS-WBP)
We develop a provably fast textitdeterministic variant of the celebrated iterative Bregman projection (IBP) algorithm, named textscFastIBP.
arXiv Detail & Related papers (2020-02-12T03:40:52Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.