Sampling with Barriers: Faster Mixing via Lewis Weights
- URL: http://arxiv.org/abs/2303.00480v2
- Date: Wed, 19 Apr 2023 04:38:02 GMT
- Title: Sampling with Barriers: Faster Mixing via Lewis Weights
- Authors: Khashayar Gatmiry, Jonathan Kelner, Santosh S. Vempala
- Abstract summary: We analyze a polytope defined by $m$ inequalities in $Rn$ endowed with the metric defined by the Hessian of a convex barrier function.
The advantage of RHMC over Euclidean methods such as the ball walk, hit-and-run and the Dikin walk is in its ability to take longer steps.
- Score: 8.701566919381223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyze Riemannian Hamiltonian Monte Carlo (RHMC) for sampling a polytope
defined by $m$ inequalities in $\R^n$ endowed with the metric defined by the
Hessian of a convex barrier function. The advantage of RHMC over Euclidean
methods such as the ball walk, hit-and-run and the Dikin walk is in its ability
to take longer steps. However, in all previous work, the mixing rate has a
linear dependence on the number of inequalities. We introduce a hybrid of the
Lewis weights barrier and the standard logarithmic barrier and prove that the
mixing rate for the corresponding RHMC is bounded by $\tilde
O(m^{1/3}n^{4/3})$, improving on the previous best bound of $\tilde
O(mn^{2/3})$ (based on the log barrier). This continues the general parallels
between optimization and sampling, with the latter typically leading to new
tools and more refined analysis. To prove our main results, we have to
overcomes several challenges relating to the smoothness of Hamiltonian curves
and the self-concordance properties of the barrier. In the process, we give a
general framework for the analysis of Markov chains on Riemannian manifolds,
derive new smoothness bounds on Hamiltonian curves, a central topic of
comparison geometry, and extend self-concordance to the infinity norm, which
gives sharper bounds; these properties appear to be of independent interest.
Related papers
- Fast Convergence of $Φ$-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler [14.34147140416535]
We study the mixing time of two popular discrete time Markov chains in continuous space.
We show that any $Phi$-divergence arising from a twice-differentiable strictly convex function $Phi$ converges to $0$ exponentially fast along these Markov chains.
arXiv Detail & Related papers (2024-10-14T16:41:45Z) - Symmetric Mean-field Langevin Dynamics for Distributional Minimax
Problems [78.96969465641024]
We extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates.
We also study time and particle discretization regimes and prove a new uniform-in-time propagation of chaos result.
arXiv Detail & Related papers (2023-12-02T13:01:29Z) - High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise [59.25598762373543]
We show that wetailed high-prob convergence guarantees of learning on streaming data in the presence of heavy-tailed noise.
We demonstrate analytically and that $ta$ can be used to the preferred choice of setting for a given problem.
arXiv Detail & Related papers (2023-10-28T18:53:41Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - When does Metropolized Hamiltonian Monte Carlo provably outperform
Metropolis-adjusted Langevin algorithm? [4.657614491309671]
We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator.
We show that the joint distribution of the location and velocity variables of the discretization of the continuous HMC dynamics stays approximately invariant.
arXiv Detail & Related papers (2023-04-10T17:35:57Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Convergence of the Riemannian Langevin Algorithm [10.279748604797911]
We study the problem of sampling from a distribution with density $nu$ with respect to the natural measure on a manifold with metric $g$.
A special case of our approach is sampling isoperimetric densities restricted to polytopes defined by the logarithmic barrier.
arXiv Detail & Related papers (2022-04-22T16:56:00Z) - Comparison of Markov chains via weak Poincar\'e inequalities with
application to pseudo-marginal MCMC [0.0]
We investigate the use of a certain class of functional inequalities known as weak Poincar'e inequalities to bound convergence of Markov chains to equilibrium.
We show that this enables the derivation of subgeometric convergence bounds for methods such as the Independent Metropolis--Hastings sampler and pseudo-marginal methods for intractable likelihoods.
arXiv Detail & Related papers (2021-12-10T15:36:30Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - The Heavy-Tail Phenomenon in SGD [7.366405857677226]
We show that depending on the structure of the Hessian of the loss at the minimum, the SGD iterates will converge to a emphheavy-tailed stationary distribution.
We translate our results into insights about the behavior of SGD in deep learning.
arXiv Detail & Related papers (2020-06-08T16:43:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.