User-friendly guarantees for the Langevin Monte Carlo with inaccurate
gradient
- URL: http://arxiv.org/abs/1710.00095v4
- Date: Fri, 23 Feb 2024 17:39:40 GMT
- Title: User-friendly guarantees for the Langevin Monte Carlo with inaccurate
gradient
- Authors: Arnak S. Dalalyan and Avetik G. Karagulyan
- Abstract summary: We analyze several methods of approximate sampling based on discretizations of the Langevin diffusion.
Our guarantees improve or extend the state-of-the-art results in three directions.
- Score: 8.348896353632165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the problem of sampling from a given probability
density function that is known to be smooth and strongly log-concave. We
analyze several methods of approximate sampling based on discretizations of the
(highly overdamped) Langevin diffusion and establish guarantees on its error
measured in the Wasserstein-2 distance. Our guarantees improve or extend the
state-of-the-art results in three directions. First, we provide an upper bound
on the error of the first-order Langevin Monte Carlo (LMC) algorithm with
optimized varying step-size. This result has the advantage of being horizon
free (we do not need to know in advance the target precision) and to improve by
a logarithmic factor the corresponding result for the constant step-size.
Second, we study the case where accurate evaluations of the gradient of the
log-density are unavailable, but one can have access to approximations of the
aforementioned gradient. In such a situation, we consider both deterministic
and stochastic approximations of the gradient and provide an upper bound on the
sampling error of the first-order LMC that quantifies the impact of the
gradient evaluation inaccuracies. Third, we establish upper bounds for two
versions of the second-order LMC, which leverage the Hessian of the
log-density. We provide nonasymptotic guarantees on the sampling error of these
second-order LMCs. These guarantees reveal that the second-order LMC algorithms
improve on the first-order LMC in ill-conditioned settings.
Related papers
- Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Unbiased Kinetic Langevin Monte Carlo with Inexact Gradients [0.8749675983608172]
We present an unbiased method for posterior means based on kinetic Langevin dynamics.
Our proposed estimator is unbiased, attains finite variance, and satisfies a central limit theorem.
Our results demonstrate that in large-scale applications, the unbiased algorithm we present can be 2-3 orders of magnitude more efficient than the gold-standard" randomized Hamiltonian Monte Carlo.
arXiv Detail & Related papers (2023-11-08T21:19:52Z) - DF2: Distribution-Free Decision-Focused Learning [53.2476224456902]
Decision-focused learning (DFL) has recently emerged as a powerful approach for predictthen-optimize problems.
Existing end-to-end DFL methods are hindered by three significant bottlenecks: model error, sample average approximation error, and distribution-based parameterization of the expected objective.
We present DF2 -- the first textit-free decision-focused learning method explicitly designed to address these three bottlenecks.
arXiv Detail & Related papers (2023-08-11T00:44:46Z) - Kinetic Langevin MCMC Sampling Without Gradient Lipschitz Continuity --
the Strongly Convex Case [0.0]
We consider sampling from log concave distributions in Hamiltonian setting, without assuming that the objective is globally Lipschitz.
We propose two algorithms based on polygonal gradient (tamed) Euler schemes, to sample from a target measure, and provide non-asymptotic 2-Wasserstein distance bounds between the law of the process of each algorithm and the target measure.
arXiv Detail & Related papers (2023-01-19T12:32:41Z) - Utilising the CLT Structure in Stochastic Gradient based Sampling :
Improved Analysis and Faster Algorithms [14.174806471635403]
We consider approximations of sampling algorithms, such as Gradient Langevin Dynamics (SGLD) and the Random Batch Method (RBM) for Interacting Particle Dynamcs (IPD)
We observe that the noise introduced by the approximation is nearly Gaussian due to the Central Limit Theorem (CLT) while the driving Brownian motion is exactly Gaussian.
We harness this structure to absorb the approximation error inside the diffusion process, and obtain improved convergence guarantees for these algorithms.
arXiv Detail & Related papers (2022-06-08T10:17:40Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - An Optimal Multistage Stochastic Gradient Method for Minimax Problems [8.615625517708324]
We study the minimax optimization problem in the smooth and strongly convex-strongly concave setting.
We first analyze the Gradient Descent Ascent (GDA) method with constant stepsize.
We propose a multistage variant of GDA that runs in multiple stages with a particular learning rate decay schedule.
arXiv Detail & Related papers (2020-02-13T18:01:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.