Related papers: Faster high-accuracy log-concave sampling via algorithmic warm starts

Faster high-accuracy log-concave sampling via algorithmic warm starts

URL: http://arxiv.org/abs/2302.10249v1
Date: Mon, 20 Feb 2023 19:27:21 GMT
Title: Faster high-accuracy log-concave sampling via algorithmic warm starts
Authors: Jason M. Altschuler and Sinho Chewi
Abstract summary: In practice, high-accuracy samplers such as the classical Metropolis-adjusted Langevin algorithm (MALA) remain the de facto gold standard. We improve the dimension dependence of this sampling problem to $tildeO(d1/2)$, whereas the previous best result for MALA was $tildeO(d)$. Our main technical contribution settles this problem by establishing the first $tildeO(d1/2)$ R'enyi mixing rates for the discretized underdamped Langevin diffusion.
Score: 6.117084972237769
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding the complexity of sampling from a strongly log-concave and log-smooth distribution $\pi$ on $\mathbb{R}^d$ to high accuracy is a fundamental problem, both from a practical and theoretical standpoint. In practice, high-accuracy samplers such as the classical Metropolis-adjusted Langevin algorithm (MALA) remain the de facto gold standard; and in theory, via the proximal sampler reduction, it is understood that such samplers are key for sampling even beyond log-concavity (in particular, for distributions satisfying isoperimetric assumptions). In this work, we improve the dimension dependence of this sampling problem to $\tilde{O}(d^{1/2})$, whereas the previous best result for MALA was $\tilde{O}(d)$. This closes the long line of work on the complexity of MALA, and moreover leads to state-of-the-art guarantees for high-accuracy sampling under strong log-concavity and beyond (thanks to the aforementioned reduction). Our starting point is that the complexity of MALA improves to $\tilde{O}(d^{1/2})$, but only under a warm start (an initialization with constant R\'enyi divergence w.r.t. $\pi$). Previous algorithms took much longer to find a warm start than to use it, and closing this gap has remained an important open problem in the field. Our main technical contribution settles this problem by establishing the first $\tilde{O}(d^{1/2})$ R\'enyi mixing rates for the discretized underdamped Langevin diffusion. For this, we develop new differential-privacy-inspired techniques based on R\'enyi divergences with Orlicz--Wasserstein shifts, which allow us to sidestep longstanding challenges for proving fast convergence of hypocoercive differential equations.

Related papers

High-accuracy sampling from constrained spaces with the Metropolis-adjusted Preconditioned Langevin Algorithm [12.405427902037971]
We propose a first-order sampling method for approximate sampling from a target distribution whose support is a proper convex subset of $mathbbRd$. Our proposed method is the result of applying a Metropolis-Hastings filter to the Markov chain formed by a single step of the preconditioned Langevin algorithm.
arXiv Detail & Related papers (2024-12-24T23:21:23Z)
Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling [28.931489333515618]
We establish an oracle complexity of $widetildeOleft(fracdbeta2cal A2varepsilon6right)$ for simple annealed Langevin Monte Carlo algorithm. We show that $cal A$ represents the action of a curve of probability measures interpolating the target distribution $pi$ and a readily sampleable distribution.
arXiv Detail & Related papers (2024-07-24T02:15:48Z)
An Improved Analysis of Langevin Algorithms with Prior Diffusion for Non-Log-Concave Sampling [27.882407333690267]
We show that modified Langevin algorithm with prior diffusion can converge dimension independently for strongly log-concave target distributions. We also prove that the modified Langevin algorithm can also obtain the dimension-independent convergence of KL divergence with different step size schedules.
arXiv Detail & Related papers (2024-03-10T11:50:34Z)
Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density. We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order. We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z)
Improved dimension dependence of a proximal algorithm for sampling [16.147290924171692]
We propose a sampling algorithm that achieves superior complexity bounds in all the classical settings. Our algorithm is based on the proximal sampler introduced incitetlee 2021. For strongly log-concave distributions, our method has complexity bound $tildemathcalO(kappa d1/2)$ without warm start. For distributions satisfying the LSI, our bound is $tilde mathcalO(hat kappa d1/2)$ where $hat kappa$ is the ratio between smoothness and
arXiv Detail & Related papers (2023-02-20T16:44:48Z)
Neural Inference of Gaussian Processes for Time Series Data of Quasars [72.79083473275742]
We introduce a new model that enables it to describe quasar spectra completely. We also introduce a new method of inference of Gaussian process parameters, which we call $textitNeural Inference$. The combination of both the CDRW model and Neural Inference significantly outperforms the baseline DRW and MLE.
arXiv Detail & Related papers (2022-11-17T13:01:26Z)
Resolving the Mixing Time of the Langevin Algorithm to its Stationary Distribution for Log-Concave Sampling [34.66940399825547]
This paper characterizes the mixing time of the Langevin Algorithm to its stationary distribution. We introduce a technique from the differential privacy literature to the sampling literature.
arXiv Detail & Related papers (2022-10-16T05:11:16Z)
Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for Log-Concave Sampling [2.9972063833424216]
We study the mixing time of the Metropolis-adjusted Langevin algorithm (MALA) for sampling from a log-smooth and strongly log-concave distribution. We establish its optimal minimax mixing time under a warm start.
arXiv Detail & Related papers (2021-09-27T14:02:27Z)
Mean-Square Analysis with An Application to Optimal Dimension Dependence of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance. Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Towards Sample-Optimal Compressive Phase Retrieval with Sparse and Generative Priors [59.33977545294148]
We show that $O(k log L)$ samples suffice to guarantee that the signal is close to any vector that minimizes an amplitude-based empirical loss function. We adapt this result to sparse phase retrieval, and show that $O(s log n)$ samples are sufficient for a similar guarantee when the underlying signal is $s$-sparse and $n$-dimensional.
arXiv Detail & Related papers (2021-06-29T12:49:54Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP) We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.