On the convergence of dynamic implementations of Hamiltonian Monte Carlo and No U-Turn Samplers
- URL: http://arxiv.org/abs/2307.03460v2
- Date: Fri, 18 Oct 2024 11:19:33 GMT
- Title: On the convergence of dynamic implementations of Hamiltonian Monte Carlo and No U-Turn Samplers
- Authors: Alain Durmus, Samuel Gruffaz, Miika Kailas, Eero Saksman, Matti Vihola,
- Abstract summary: We consider a general class of MCMC algorithms we call dynamic HMC.
We show that this general framework encompasses NUTS as a particular case.
Under conditions similar to the ones existing for HMC, we also show that NUTS is geometrically ergodic.
- Score: 8.999094822549067
- License:
- Abstract: There is substantial empirical evidence about the success of dynamic implementations of Hamiltonian Monte Carlo (HMC), such as the No U-Turn Sampler (NUTS), in many challenging inference problems but theoretical results about their behavior are scarce. The aim of this paper is to fill this gap. More precisely, we consider a general class of MCMC algorithms we call dynamic HMC. We show that this general framework encompasses NUTS as a particular case, implying the invariance of the target distribution as a by-product. Second, we establish conditions under which NUTS is irreducible and aperiodic and as a corrolary ergodic. Under conditions similar to the ones existing for HMC, we also show that NUTS is geometrically ergodic. Finally, we improve existing convergence results for HMC showing that this method is ergodic without any boundedness condition on the stepsize and the number of leapfrog steps, in the case where the target is a perturbation of a Gaussian distribution.
Related papers
- SpreadNUTS -- Moderate Dynamic Extension of Paths for No-U-Turn Sampling
& Partitioning Visited Regions [0.0]
This paper introduces modifications to a specific Hamiltonian Monte Carlo (HMC) algorithm known as the no-U-turn sampler (NUTS)
NUTS aims to explore the sample space faster than NUTS, yielding a sampler that has faster convergence to the true distribution than NUTS.
arXiv Detail & Related papers (2023-07-09T05:00:25Z) - Quantum Metric Unveils Defect Freezing in Non-Hermitian Systems [1.2289361708127877]
We study the dynamics of an exactly solvable non-Hermitian system, hosting both $mathcalPT$-symmetric and $mathcalPT$-broken modes.
In contrast to Hermitian systems, our study reveals that PT -broken time evolution leads to defect freezing and hence the violation of adiabaticity.
arXiv Detail & Related papers (2023-01-05T19:00:00Z) - A signature of quantumness in pure decoherence control [0.0]
We study a decoherence reduction scheme that involves an intermediate measurement on the qubit in an equal superposition basis.
We show under what circumstances the scheme always leads to a gain of coherence on average, regardless of the time at which the measurement is performed.
We find that observing an average loss of coherence is a highly quantum effect, resulting from non-commutation of different terms in the Hamiltonian.
arXiv Detail & Related papers (2022-11-09T14:13:25Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Hamiltonian Monte Carlo with Asymmetrical Momentum Distributions [3.562271099341746]
We present a novel convergence analysis for the Hamiltonian Monte Carlo (HMC) algorithm.
We show that plain HMC with asymmetrical momentum distributions breaks a key self-adjointness requirement.
We propose a modified version that we call the Alternating Direction HMC (AD-HMC)
arXiv Detail & Related papers (2021-10-21T18:36:19Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - What Are Bayesian Neural Network Posteriors Really Like? [63.950151520585024]
We show that Hamiltonian Monte Carlo can achieve significant performance gains over standard and deep ensembles.
We also show that deep distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.
arXiv Detail & Related papers (2021-04-29T15:38:46Z) - On the Generalization of Stochastic Gradient Descent with Momentum [58.900860437254885]
We first show that there exists a convex loss function for which algorithmic stability fails to establish generalization guarantees.
For smooth Lipschitz loss functions, we analyze a modified momentum-based update rule, and show that it admits an upper-bound on the generalization error.
For the special case of strongly convex loss functions, we find a range of momentum such that multiple epochs of standard SGDM, as a special form of SGDEM, also generalizes.
arXiv Detail & Related papers (2021-02-26T18:58:29Z) - On the Convergence of Continuous Constrained Optimization for Structure
Learning [30.279796192573805]
We show the convergence of the augmented Lagrangian method (ALM) and the quadratic penalty method (QPM) for structure learning in the linear, nonlinear, and confounded cases.
We further establish the convergence guarantee of QPM to a DAG solution, under mild conditions.
arXiv Detail & Related papers (2020-11-23T00:29:37Z) - Generalized Sliced Distances for Probability Distributions [47.543990188697734]
We introduce a broad family of probability metrics, coined as Generalized Sliced Probability Metrics (GSPMs)
GSPMs are rooted in the generalized Radon transform and come with a unique geometric interpretation.
We consider GSPM-based gradient flows for generative modeling applications and show that under mild assumptions, the gradient flow converges to the global optimum.
arXiv Detail & Related papers (2020-02-28T04:18:00Z) - On the Generalization of Stochastic Gradient Descent with Momentum [84.54924994010703]
momentum-based accelerated variants of gradient descent (SGD) are widely used when training machine learning models.
We first show that there exists a convex loss function for which the stability gap for multiple epochs of SGD with standard heavy-ball momentum (SGDM) becomes unbounded.
For smooth Lipschitz loss functions, we analyze a modified momentum-based update rule, i.e., SGD with early momentum (SGDEM) under a broad range of step-sizes.
arXiv Detail & Related papers (2018-09-12T17:02:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.