Towards a Theory of Non-Log-Concave Sampling: First-Order Stationarity
Guarantees for Langevin Monte Carlo
- URL: http://arxiv.org/abs/2202.05214v1
- Date: Thu, 10 Feb 2022 18:20:55 GMT
- Title: Towards a Theory of Non-Log-Concave Sampling: First-Order Stationarity
Guarantees for Langevin Monte Carlo
- Authors: Krishnakumar Balasubramanian, Sinho Chewi, Murat A. Erdogdu, Adil
Salim, Matthew Zhang
- Abstract summary: This is the sampling iterations for finding an $pi exp(-V)$ on $mathd.
We discuss numerous extensions applications; in particular, it is the first step towards the general theory of non-concave sampling.
- Score: 24.00911089902082
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For the task of sampling from a density $\pi \propto \exp(-V)$ on
$\mathbb{R}^d$, where $V$ is possibly non-convex but $L$-gradient Lipschitz, we
prove that averaged Langevin Monte Carlo outputs a sample with
$\varepsilon$-relative Fisher information after $O( L^2 d^2/\varepsilon^2)$
iterations. This is the sampling analogue of complexity bounds for finding an
$\varepsilon$-approximate first-order stationary points in non-convex
optimization and therefore constitutes a first step towards the general theory
of non-log-concave sampling. We discuss numerous extensions and applications of
our result; in particular, it yields a new state-of-the-art guarantee for
sampling from distributions which satisfy a Poincar\'e inequality.
Related papers
- Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling [28.931489333515618]
We establish an oracle complexity of $widetildeOleft(fracdbeta2cal A2varepsilon6right)$ for simple annealed Langevin Monte Carlo algorithm.
We show that $cal A$ represents the action of a curve of probability measures interpolating the target distribution $pi$ and a readily sampleable distribution.
arXiv Detail & Related papers (2024-07-24T02:15:48Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Stochastic Approximation Approaches to Group Distributionally Robust
Optimization [96.26317627118912]
Group distributionally robust optimization (GDRO)
Online learning techniques to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample.
A novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Stochastic Langevin Monte Carlo for (weakly) log-concave posterior
distributions [0.0]
We investigate a continuous time version of the Langevin Monte Carlo method, introduced in [WT11], that incorporates a sampling step inside the traditional over-damped Langevin diffusion.
This method is popular in machine learning for sampling posterior distribution.
arXiv Detail & Related papers (2023-01-08T17:08:21Z) - Optimal Sublinear Sampling of Spanning Trees and Determinantal Point
Processes via Average-Case Entropic Independence [3.9586758145580014]
We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions.
For a graph $G=(V, E)$, we show how to approximately sample uniformly random spanning trees from $G$ in $widetildeO(lvert Vrvert)$ time per sample.
For a determinantal point process on subsets of size $k$ of a ground set of $n$ elements, we show how to approximately sample in $widetildeO(komega)$ time after an initial $widetildeO(nk
arXiv Detail & Related papers (2022-04-06T04:11:26Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Structured Logconcave Sampling with a Restricted Gaussian Oracle [23.781520510778716]
We give algorithms for sampling several structured logconcave families to high accuracy.
We further develop a reduction framework, inspired by proximal point methods in convex optimization.
arXiv Detail & Related papers (2020-10-07T01:43:07Z) - Convergence of Langevin Monte Carlo in Chi-Squared and Renyi Divergence [8.873449722727026]
We show that the rate estimate $widetildemathcalO(depsilon-1)$ improves the previously known rates in both of these metrics.
In particular, for convex and firstorder smooth potentials, we show that LMC algorithm achieves the rate estimate $widetildemathcalO(depsilon-1)$ which improves the previously known rates in both of these metrics.
arXiv Detail & Related papers (2020-07-22T18:18:28Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.