User-friendly guarantees for the Langevin Monte Carlo with inaccurate
gradient
- URL: http://arxiv.org/abs/1710.00095v4
- Date: Fri, 23 Feb 2024 17:39:40 GMT
- Title: User-friendly guarantees for the Langevin Monte Carlo with inaccurate
gradient
- Authors: Arnak S. Dalalyan and Avetik G. Karagulyan
- Abstract summary: We analyze several methods of approximate sampling based on discretizations of the Langevin diffusion.
Our guarantees improve or extend the state-of-the-art results in three directions.
- Score: 8.348896353632165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the problem of sampling from a given probability
density function that is known to be smooth and strongly log-concave. We
analyze several methods of approximate sampling based on discretizations of the
(highly overdamped) Langevin diffusion and establish guarantees on its error
measured in the Wasserstein-2 distance. Our guarantees improve or extend the
state-of-the-art results in three directions. First, we provide an upper bound
on the error of the first-order Langevin Monte Carlo (LMC) algorithm with
optimized varying step-size. This result has the advantage of being horizon
free (we do not need to know in advance the target precision) and to improve by
a logarithmic factor the corresponding result for the constant step-size.
Second, we study the case where accurate evaluations of the gradient of the
log-density are unavailable, but one can have access to approximations of the
aforementioned gradient. In such a situation, we consider both deterministic
and stochastic approximations of the gradient and provide an upper bound on the
sampling error of the first-order LMC that quantifies the impact of the
gradient evaluation inaccuracies. Third, we establish upper bounds for two
versions of the second-order LMC, which leverage the Hessian of the
log-density. We provide nonasymptotic guarantees on the sampling error of these
second-order LMCs. These guarantees reveal that the second-order LMC algorithms
improve on the first-order LMC in ill-conditioned settings.
Related papers
- DF2: Distribution-Free Decision-Focused Learning [53.2476224456902]
Decision-focused learning (DFL) has recently emerged as a powerful approach for predictthen-optimize problems.
Existing end-to-end DFL methods are hindered by three significant bottlenecks: model error, sample average approximation error, and distribution-based parameterization of the expected objective.
We present DF2 -- the first textit-free decision-focused learning method explicitly designed to address these three bottlenecks.
arXiv Detail & Related papers (2023-08-11T00:44:46Z) - Sobolev Space Regularised Pre Density Models [51.558848491038916]
We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density.
This method is statistically consistent, and makes the inductive validation model clear and consistent.
arXiv Detail & Related papers (2023-07-25T18:47:53Z) - Kinetic Langevin MCMC Sampling Without Gradient Lipschitz Continuity --
the Strongly Convex Case [0.0]
We consider sampling from log concave distributions in Hamiltonian setting, without assuming that the objective is globally Lipschitz.
We propose two algorithms based on polygonal gradient (tamed) Euler schemes, to sample from a target measure, and provide non-asymptotic 2-Wasserstein distance bounds between the law of the process of each algorithm and the target measure.
arXiv Detail & Related papers (2023-01-19T12:32:41Z) - DRSOM: A Dimension Reduced Second-Order Method [13.778619250890406]
Under a trust-like framework, our method preserves the convergence of the second-order method while using only information in a few directions.
Theoretically, we show that the method has a local convergence and a global convergence rate of $O(epsilon-3/2)$ to satisfy the first-order and second-order conditions.
arXiv Detail & Related papers (2022-07-30T13:05:01Z) - Utilising the CLT Structure in Stochastic Gradient based Sampling :
Improved Analysis and Faster Algorithms [14.174806471635403]
We consider approximations of sampling algorithms, such as Gradient Langevin Dynamics (SGLD) and the Random Batch Method (RBM) for Interacting Particle Dynamcs (IPD)
We observe that the noise introduced by the approximation is nearly Gaussian due to the Central Limit Theorem (CLT) while the driving Brownian motion is exactly Gaussian.
We harness this structure to absorb the approximation error inside the diffusion process, and obtain improved convergence guarantees for these algorithms.
arXiv Detail & Related papers (2022-06-08T10:17:40Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction.
We adaptively select the descent steps where the measure reduction is carried out.
We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z) - An Optimal Multistage Stochastic Gradient Method for Minimax Problems [8.615625517708324]
We study the minimax optimization problem in the smooth and strongly convex-strongly concave setting.
We first analyze the Gradient Descent Ascent (GDA) method with constant stepsize.
We propose a multistage variant of GDA that runs in multiple stages with a particular learning rate decay schedule.
arXiv Detail & Related papers (2020-02-13T18:01:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.