Faster high-accuracy log-concave sampling via algorithmic warm starts
- URL: http://arxiv.org/abs/2302.10249v1
- Date: Mon, 20 Feb 2023 19:27:21 GMT
- Title: Faster high-accuracy log-concave sampling via algorithmic warm starts
- Authors: Jason M. Altschuler and Sinho Chewi
- Abstract summary: In practice, high-accuracy samplers such as the classical Metropolis-adjusted Langevin algorithm (MALA) remain the de facto gold standard.
We improve the dimension dependence of this sampling problem to $tildeO(d1/2)$, whereas the previous best result for MALA was $tildeO(d)$.
Our main technical contribution settles this problem by establishing the first $tildeO(d1/2)$ R'enyi mixing rates for the discretized underdamped Langevin diffusion.
- Score: 6.117084972237769
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the complexity of sampling from a strongly log-concave and
log-smooth distribution $\pi$ on $\mathbb{R}^d$ to high accuracy is a
fundamental problem, both from a practical and theoretical standpoint. In
practice, high-accuracy samplers such as the classical Metropolis-adjusted
Langevin algorithm (MALA) remain the de facto gold standard; and in theory, via
the proximal sampler reduction, it is understood that such samplers are key for
sampling even beyond log-concavity (in particular, for distributions satisfying
isoperimetric assumptions).
In this work, we improve the dimension dependence of this sampling problem to
$\tilde{O}(d^{1/2})$, whereas the previous best result for MALA was
$\tilde{O}(d)$. This closes the long line of work on the complexity of MALA,
and moreover leads to state-of-the-art guarantees for high-accuracy sampling
under strong log-concavity and beyond (thanks to the aforementioned reduction).
Our starting point is that the complexity of MALA improves to
$\tilde{O}(d^{1/2})$, but only under a warm start (an initialization with
constant R\'enyi divergence w.r.t. $\pi$). Previous algorithms took much longer
to find a warm start than to use it, and closing this gap has remained an
important open problem in the field. Our main technical contribution settles
this problem by establishing the first $\tilde{O}(d^{1/2})$ R\'enyi mixing
rates for the discretized underdamped Langevin diffusion. For this, we develop
new differential-privacy-inspired techniques based on R\'enyi divergences with
Orlicz--Wasserstein shifts, which allow us to sidestep longstanding challenges
for proving fast convergence of hypocoercive differential equations.
Related papers
- Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling [28.931489333515618]
We establish an oracle complexity of $widetildeOleft(fracdbeta2cal A2varepsilon6right)$ for simple annealed Langevin Monte Carlo algorithm.
We show that $cal A$ represents the action of a curve of probability measures interpolating the target distribution $pi$ and a readily sampleable distribution.
arXiv Detail & Related papers (2024-07-24T02:15:48Z) - An Improved Analysis of Langevin Algorithms with Prior Diffusion for
Non-Log-Concave Sampling [27.882407333690267]
We show that modified Langevin algorithm with prior diffusion can converge dimension independently for strongly log-concave target distributions.
We also prove that the modified Langevin algorithm can also obtain the dimension-independent convergence of KL divergence with different step size schedules.
arXiv Detail & Related papers (2024-03-10T11:50:34Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - Improved dimension dependence of a proximal algorithm for sampling [16.147290924171692]
We propose a sampling algorithm that achieves superior complexity bounds in all the classical settings.
Our algorithm is based on the proximal sampler introduced incitetlee 2021.
For strongly log-concave distributions, our method has complexity bound $tildemathcalO(kappa d1/2)$ without warm start.
For distributions satisfying the LSI, our bound is $tilde mathcalO(hat kappa d1/2)$ where $hat kappa$ is the ratio between smoothness and
arXiv Detail & Related papers (2023-02-20T16:44:48Z) - Neural Inference of Gaussian Processes for Time Series Data of Quasars [72.79083473275742]
We introduce a new model that enables it to describe quasar spectra completely.
We also introduce a new method of inference of Gaussian process parameters, which we call $textitNeural Inference$.
The combination of both the CDRW model and Neural Inference significantly outperforms the baseline DRW and MLE.
arXiv Detail & Related papers (2022-11-17T13:01:26Z) - Resolving the Mixing Time of the Langevin Algorithm to its Stationary
Distribution for Log-Concave Sampling [34.66940399825547]
This paper characterizes the mixing time of the Langevin Algorithm to its stationary distribution.
We introduce a technique from the differential privacy literature to the sampling literature.
arXiv Detail & Related papers (2022-10-16T05:11:16Z) - Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for
Log-Concave Sampling [2.9972063833424216]
We study the mixing time of the Metropolis-adjusted Langevin algorithm (MALA) for sampling from a log-smooth and strongly log-concave distribution.
We establish its optimal minimax mixing time under a warm start.
arXiv Detail & Related papers (2021-09-27T14:02:27Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Towards Sample-Optimal Compressive Phase Retrieval with Sparse and
Generative Priors [59.33977545294148]
We show that $O(k log L)$ samples suffice to guarantee that the signal is close to any vector that minimizes an amplitude-based empirical loss function.
We adapt this result to sparse phase retrieval, and show that $O(s log n)$ samples are sufficient for a similar guarantee when the underlying signal is $s$-sparse and $n$-dimensional.
arXiv Detail & Related papers (2021-06-29T12:49:54Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.