Fast Convergence of $Φ$-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler
- URL: http://arxiv.org/abs/2410.10699v2
- Date: Wed, 12 Feb 2025 15:52:11 GMT
- Title: Fast Convergence of $Φ$-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler
- Authors: Siddharth Mitra, Andre Wibisono,
- Abstract summary: We study the mixing time of two popular discrete-time Markov chains in continuous space.
We show that any $Phi$-divergence arising from a twice-differentiable strictly convex function $Phi$ converges to $0$ exponentially fast along these Markov chains.
- Score: 14.34147140416535
- License:
- Abstract: We study the mixing time of two popular discrete-time Markov chains in continuous space, the Unadjusted Langevin Algorithm and the Proximal Sampler, which are discretizations of the Langevin dynamics. We extend mixing time analyses for these Markov chains to hold in $\Phi$-divergence. We show that any $\Phi$-divergence arising from a twice-differentiable strictly convex function $\Phi$ converges to $0$ exponentially fast along these Markov chains, under the assumption that their stationary distributions satisfy the corresponding $\Phi$-Sobolev inequality, which holds for example when the target distribution of the Langevin dynamics is strongly log-concave. Our setting includes as special cases popular mixing time regimes, namely the mixing in chi-squared divergence under a Poincar\'e inequality, and the mixing in relative entropy under a log-Sobolev inequality. Our results follow by viewing the sampling algorithms as noisy channels and bounding the contraction coefficients arising in the appropriate strong data processing inequalities.
Related papers
- Efficiently learning and sampling multimodal distributions with data-based initialization [20.575122468674536]
We consider the problem of sampling a multimodal distribution with a Markov chain given a small number of samples from the stationary measure.
We show that if the Markov chain has a $k$th order spectral gap, samples from the stationary distribution will efficiently generate a sample whose conditional law is $varepsilon$-close in TV distance to the stationary measure.
arXiv Detail & Related papers (2024-11-14T01:37:02Z) - Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.
We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.
Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - In-and-Out: Algorithmic Diffusion for Sampling Convex Bodies [7.70133333709347]
We present a new random walk for uniformly sampling high-dimensional convex bodies.
It achieves state-of-the-art runtime complexity with stronger guarantees on the output.
arXiv Detail & Related papers (2024-05-02T16:15:46Z) - Characterizing Dependence of Samples along the Langevin Dynamics and Algorithms via Contraction of $Φ$-Mutual Information [16.54557731304283]
We study the question of how fast the samples become approximately independent along popular Markov chains for continuous-space sampling.
Our proof technique is based on showing the Strong Data Processing Inequalities (SDPIs) hold along the Markov chains.
arXiv Detail & Related papers (2024-02-26T23:05:02Z) - Symmetric Mean-field Langevin Dynamics for Distributional Minimax
Problems [78.96969465641024]
We extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates.
We also study time and particle discretization regimes and prove a new uniform-in-time propagation of chaos result.
arXiv Detail & Related papers (2023-12-02T13:01:29Z) - Proximal Algorithms for Accelerated Langevin Dynamics [57.08271964961975]
We develop a novel class of MCMC algorithms based on a stochastized Nesterov scheme.
We show superior performance of the proposed method over typical Langevin samplers for different models in statistics and image processing.
arXiv Detail & Related papers (2023-11-24T19:56:01Z) - Taming under isoperimetry [0.0]
In this article we propose a Langevin-based scheme calledmathbfsTULA$ to sample from distributions with growing log.
We derive non-asymientKL and consequently consequently satisfy a Log-Sobolev inequality.
arXiv Detail & Related papers (2023-11-15T14:44:16Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Finite-Time Convergence Rates of Nonlinear Two-Time-Scale Stochastic
Approximation under Markovian Noise [2.0305676256390934]
We study the so-called two-time-scale approximation, a simulation-based approach for finding the roots of two coupled nonlinear operators.
In particular, we consider the scenario where the data in the method are generated by Markov processes, therefore, they are dependent.
Under some fairly standard assumptions on the operators and the Markov processes, we provide a formula that characterizes the convergence rate of the mean square errors generated by the method to zero.
arXiv Detail & Related papers (2021-04-04T15:19:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.