On the Convergence of Langevin Monte Carlo: The Interplay between Tail
Growth and Smoothness
- URL: http://arxiv.org/abs/2005.13097v1
- Date: Wed, 27 May 2020 00:26:20 GMT
- Title: On the Convergence of Langevin Monte Carlo: The Interplay between Tail
Growth and Smoothness
- Authors: Murat A. Erdogdu, Rasa Hosseinzadeh
- Abstract summary: We show that for potentials with Lipschitz gradient, i.e. $beta=1$, our rate rate recovers the best known rate rate of dependency.
Our results are applicable to $nu_* = eff$ in target distribution $nu_*$ in KL-divergence.
- Score: 10.482805367361818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study sampling from a target distribution ${\nu_* = e^{-f}}$ using the
unadjusted Langevin Monte Carlo (LMC) algorithm. For any potential function $f$
whose tails behave like ${\|x\|^\alpha}$ for ${\alpha \in [1,2]}$, and has
$\beta$-H\"older continuous gradient, we prove that ${\widetilde{\mathcal{O}}
\Big(d^{\frac{1}{\beta}+\frac{1+\beta}{\beta}(\frac{2}{\alpha} -
\boldsymbol{1}_{\{\alpha \neq 1\}})} \epsilon^{-\frac{1}{\beta}}\Big)}$ steps
are sufficient to reach the $\epsilon $-neighborhood of a $d$-dimensional
target distribution $\nu_*$ in KL-divergence. This convergence rate, in terms
of $\epsilon$ dependency, is not directly influenced by the tail growth rate
$\alpha$ of the potential function as long as its growth is at least linear,
and it only relies on the order of smoothness $\beta$. One notable consequence
of this result is that for potentials with Lipschitz gradient, i.e. $\beta=1$,
our rate recovers the best known rate
${\widetilde{\mathcal{O}}(d\epsilon^{-1})}$ which was established for strongly
convex potentials in terms of $\epsilon$ dependency, but we show that the same
rate is achievable for a wider class of potentials that are degenerately convex
at infinity. The growth rate $\alpha$ starts to have an effect on the
established rate in high dimensions where $d$ is large; furthermore, it
recovers the best-known dimension dependency when the tail growth of the
potential is quadratic, i.e. ${\alpha = 2}$, in the current setup. Our
framework allows for finite perturbations, and any order of smoothness
${\beta\in(0,1]}$; consequently, our results are applicable to a wide class of
non-convex potentials that are weakly smooth and exhibit at least linear tail
growth.
Related papers
- On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - A qualitative difference between gradient flows of convex functions in
finite- and infinite-dimensional Hilbert spaces [2.7195102129095003]
We consider gradient flow/gradient descent and heavy ball/accelerated gradient descent optimization for convex objective functions.
In Hilbert spaces, this is optimal: $f(x_t) - inf f$ can decay to $0$ as slowly as any given function which is monotone decreasing and integrable at $infty$.
arXiv Detail & Related papers (2023-10-26T17:33:52Z) - Estimation and Inference in Distributional Reinforcement Learning [28.253677740976197]
We show that a dataset of size $widetilde Oleft(frac|mathcalS||mathcalA|epsilon2 (1-gamma)4right)$ suffices to ensure the Kolmogorov metric and total variation metric between $hatetapi$ and $etapi$ is below $epsilon$ with high probability.
Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of $etapi$.
arXiv Detail & Related papers (2023-09-29T14:14:53Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - High-dimensional Asymptotics of Feature Learning: How One Gradient Step
Improves the Representation [89.21686761957383]
We study the first gradient descent step on the first-layer parameters $boldsymbolW$ in a two-layer network.
Our results demonstrate that even one step can lead to a considerable advantage over random features.
arXiv Detail & Related papers (2022-05-03T12:09:59Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Convergence Rate of the (1+1)-Evolution Strategy with Success-Based
Step-Size Adaptation on Convex Quadratic Functions [20.666734673282498]
The (1+1)-evolution strategy (ES) with success-based step-size adaptation is analyzed on a general convex quadratic function.
The convergence rate of the (1+1)-ES is derived explicitly and rigorously on a general convex quadratic function.
arXiv Detail & Related papers (2021-03-02T09:03:44Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z) - Convergence of Langevin Monte Carlo in Chi-Squared and Renyi Divergence [8.873449722727026]
We show that the rate estimate $widetildemathcalO(depsilon-1)$ improves the previously known rates in both of these metrics.
In particular, for convex and firstorder smooth potentials, we show that LMC algorithm achieves the rate estimate $widetildemathcalO(depsilon-1)$ which improves the previously known rates in both of these metrics.
arXiv Detail & Related papers (2020-07-22T18:18:28Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.