Related papers: Fast Convergence of $Φ$-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler

Fast Convergence of $Φ$-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler

URL: http://arxiv.org/abs/2410.10699v2
Date: Wed, 12 Feb 2025 15:52:11 GMT
Title: Fast Convergence of $Φ$-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler
Authors: Siddharth Mitra, Andre Wibisono,
Abstract summary: We study the mixing time of two popular discrete-time Markov chains in continuous space.<n>We show that any $Phi$-divergence arising from a twice-differentiable strictly convex function $Phi$ converges to $0$ exponentially fast along these Markov chains.
Score: 14.34147140416535
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the mixing time of two popular discrete-time Markov chains in continuous space, the Unadjusted Langevin Algorithm and the Proximal Sampler, which are discretizations of the Langevin dynamics. We extend mixing time analyses for these Markov chains to hold in $\Phi$-divergence. We show that any $\Phi$-divergence arising from a twice-differentiable strictly convex function $\Phi$ converges to $0$ exponentially fast along these Markov chains, under the assumption that their stationary distributions satisfy the corresponding $\Phi$-Sobolev inequality, which holds for example when the target distribution of the Langevin dynamics is strongly log-concave. Our setting includes as special cases popular mixing time regimes, namely the mixing in chi-squared divergence under a Poincar\'e inequality, and the mixing in relative entropy under a log-Sobolev inequality. Our results follow by viewing the sampling algorithms as noisy channels and bounding the contraction coefficients arising in the appropriate strong data processing inequalities.

Related papers

Efficiently learning and sampling multimodal distributions with data-based initialization [20.575122468674536]
We consider the problem of sampling a multimodal distribution with a Markov chain given a small number of samples from the stationary measure. We show that if the Markov chain has a $k$th order spectral gap, samples from the stationary distribution will efficiently generate a sample whose conditional law is $varepsilon$-close in TV distance to the stationary measure.
arXiv Detail & Related papers (2024-11-14T01:37:02Z)
Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework. We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points. Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z)
In-and-Out: Algorithmic Diffusion for Sampling Convex Bodies [7.70133333709347]
We present a new random walk for uniformly sampling high-dimensional convex bodies. It achieves state-of-the-art runtime complexity with stronger guarantees on the output.
arXiv Detail & Related papers (2024-05-02T16:15:46Z)
DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server. To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z)
Characterizing Dependence of Samples along the Langevin Dynamics and Algorithms via Contraction of $Φ$-Mutual Information [16.54557731304283]
We study the question of how fast the samples become approximately independent along popular Markov chains for continuous-space sampling. Our proof technique is based on showing the Strong Data Processing Inequalities (SDPIs) hold along the Markov chains.
arXiv Detail & Related papers (2024-02-26T23:05:02Z)
Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems [78.96969465641024]
We extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates. We also study time and particle discretization regimes and prove a new uniform-in-time propagation of chaos result.
arXiv Detail & Related papers (2023-12-02T13:01:29Z)
Proximal Algorithms for Accelerated Langevin Dynamics [57.08271964961975]
We develop a novel class of MCMC algorithms based on a stochastized Nesterov scheme. We show superior performance of the proposed method over typical Langevin samplers for different models in statistics and image processing.
arXiv Detail & Related papers (2023-11-24T19:56:01Z)
Taming under isoperimetry [0.0]
In this article we propose a Langevin-based scheme calledmathbfsTULA$ to sample from distributions with growing log. We derive non-asymientKL and consequently consequently satisfy a Log-Sobolev inequality.
arXiv Detail & Related papers (2023-11-15T14:44:16Z)
Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density. We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order. We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z)
Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution. We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z)
Mean-Square Analysis with An Application to Optimal Dimension Dependence of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance. Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z)
Finite-Time Convergence Rates of Nonlinear Two-Time-Scale Stochastic Approximation under Markovian Noise [2.0305676256390934]
We study the so-called two-time-scale approximation, a simulation-based approach for finding the roots of two coupled nonlinear operators. In particular, we consider the scenario where the data in the method are generated by Markov processes, therefore, they are dependent. Under some fairly standard assumptions on the operators and the Markov processes, we provide a formula that characterizes the convergence rate of the mean square errors generated by the method to zero.
arXiv Detail & Related papers (2021-04-04T15:19:19Z)
Concentration inequality for U-statistics of order two for uniformly ergodic Markov chains [0.0]
We prove a concentration inequality for U-statistics of order two for uniformly ergodic Markov chains. We show that we can recover the convergence rate of Arcones and Gin'e who proved a concentration result for U-statistics of independent random variables and canonical kernels.
arXiv Detail & Related papers (2020-11-20T15:14:34Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP) We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise [28.891930079358954]
We provide a finite-time analysis for linear two timescale SA scheme. Our bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise. We present an expansion of the expected error with a matching lower bound.
arXiv Detail & Related papers (2020-02-04T13:03:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.