Related papers: Unadjusted Hamiltonian MCMC with Stratified Monte Carlo Time Integration

Unadjusted Hamiltonian MCMC with Stratified Monte Carlo Time Integration

URL: http://arxiv.org/abs/2211.11003v1
Date: Sun, 20 Nov 2022 15:45:26 GMT
Title: Unadjusted Hamiltonian MCMC with Stratified Monte Carlo Time Integration
Authors: Nawaf Bou-Rabee, Milo Marsden
Abstract summary: A novel unadjusted Hamiltonian Monte Carlo (uHMC) algorithm is suggested. It uses a stratified Monte Carlo (SMC) time integrator for the underlying Hamiltonian dynamics. The complexity of the uHMC algorithm with Verlet time integration is in general $Oleft((d/K)1/2 (L/K)2 varepsilon-1 log( boldsymbolmathcalW2(mu, nu) / varepsilon-1 log( boldsymbolmathcalW
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A novel unadjusted Hamiltonian Monte Carlo (uHMC) algorithm is suggested that uses a stratified Monte Carlo (SMC) time integrator for the underlying Hamiltonian dynamics in place of the usual Verlet time integrator. For target distributions of the form $\mu(dx) \propto e^{-U(x)} dx$ where $U: \mathbb{R}^d \to \mathbb{R}_{\ge 0}$ is both $K$-strongly convex and $L$-gradient Lipschitz, and initial distributions $\nu$ with finite second moment, coupling proofs reveal that an $\varepsilon$-accurate approximation of the target distribution $\mu$ in $L^2$-Wasserstein distance $\boldsymbol{\mathcal{W}}^2$ can be achieved by the uHMC algorithm with SMC time integration using $O\left((d/K)^{1/3} (L/K)^{5/3} \varepsilon^{-2/3} \log( \boldsymbol{\mathcal{W}}^2(\mu, \nu) / \varepsilon)^+\right)$ gradient evaluations; whereas without any additional assumptions the corresponding complexity of the uHMC algorithm with Verlet time integration is in general $O\left((d/K)^{1/2} (L/K)^2 \varepsilon^{-1} \log( \boldsymbol{\mathcal{W}}^2(\mu, \nu) / \varepsilon)^+ \right)$. The SMC time integrator involves a minor modification to Verlet, and hence, is easy to implement.

Related papers

Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs [54.28273395444243]
We show that the Monotonic Value Omega (MVP) algorithm achieves a variance-aware gap-dependent regret bound of $$tildeOleft(left(sum_Delta_h(s,a)>0 fracH2 log K land mathttVar_maxtextc$.
arXiv Detail & Related papers (2025-06-06T20:33:57Z)
Nonparametric MLE for Gaussian Location Mixtures: Certified Computation and Generic Behavior [28.71736321665378]
We study the nonparametric maximum likelihood estimator $widehatpi$ for Gaussian location mixtures in one dimension. We provide an algorithm which for small enough $varepsilon>0$ computes an $varepsilon$-approximation of $widehatpi$ in Wasserstein distance in time. We also show the distribution of $widehatpi$ conditioned to be $k$-atomic admits a density on the associated $2k-1$ dimensional parameter space.
arXiv Detail & Related papers (2025-03-26T03:36:36Z)
Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization [0.0]
This paper focuses on the complexity of estimating translation translation $boldsymbolmu in mathbbRl$ and shrinkage $sigma in mathbbR_++$ parameters. We highlight that while the problem is NP-hard for Maximum Likelihood Estimation (MLE), it is possible to obtain $varepsilon$-approxs for arbitrary $varepsilon > 0$ within $textpoly left( frac1varepsilon )$ time using the
arXiv Detail & Related papers (2025-01-17T13:07:52Z)
Log-concave Sampling from a Convex Body with a Barrier: a Robust and Unified Dikin Walk [12.842909157175582]
We consider the problem of sampling from a $d$-dimensional log-concave distribution $pi(theta) propto exp(-f(theta))$ for $L$-Lipschitz $f$. We propose a emphrobust sampling framework that computes spectral approximations to the Hessian of the barrier functions in each iteration.
arXiv Detail & Related papers (2024-10-08T05:32:51Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z)
Convergence of a Normal Map-based Prox-SGD Method under the KL Inequality [0.0]
We present a novel map-based algorithm ($mathsfnorMtext-mathsfSGD$) for $symbol$k$ convergence problems.
arXiv Detail & Related papers (2023-05-10T01:12:11Z)
Hamiltonian Monte Carlo for efficient Gaussian sampling: long and random steps [0.0]
Hamiltonian Monte Carlo (HMC) is a Markov chain algorithm for sampling from a high-dimensional distribution with density $e-f(x)$. We show that HMC can sample from a distribution that is $varepsilon$-close in total variation distance using $widetildeO(sqrtkappa d1/4 log(1/varepsilon)$ gradient queries.
arXiv Detail & Related papers (2022-09-26T15:29:29Z)
Accelerating Hamiltonian Monte Carlo via Chebyshev Integration Time [13.427128424538502]
Hamiltonian Monte Carlo (HMC) is a popular method in sampling. We propose a scheme of time-varying integration time based on the roots of Chebyshevs.
arXiv Detail & Related papers (2022-07-05T17:42:22Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals. Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z)
Kernel Thinning [26.25415159542831]
kernel thinning is a new procedure for compressing a distribution $mathbbP$ more effectively than i.i.d. sampling or standard thinning. We derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat'ern, and B-spline kernels.
arXiv Detail & Related papers (2021-05-12T17:56:42Z)
Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist. We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z)
Fixed-Support Wasserstein Barycenters: Computational Hardness and Fast Algorithm [100.11971836788437]
We study the fixed-support Wasserstein barycenter problem (FS-WBP) We develop a provably fast textitdeterministic variant of the celebrated iterative Bregman projection (IBP) algorithm, named textscFastIBP.
arXiv Detail & Related papers (2020-02-12T03:40:52Z)
On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model. Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.