Convergence of Recursive Stochastic Algorithms using Wasserstein
Divergence
- URL: http://arxiv.org/abs/2003.11403v2
- Date: Tue, 5 Jan 2021 15:47:23 GMT
- Title: Convergence of Recursive Stochastic Algorithms using Wasserstein
Divergence
- Authors: Abhishek Gupta and William B. Haskell
- Abstract summary: We show that convergence of a large family of constant stepsize RSAs can be understood using this framework.
We show that convergence of a large family of constant stepsize RSAs can be understood using this framework.
- Score: 4.688616907736838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper develops a unified framework, based on iterated random operator
theory, to analyze the convergence of constant stepsize recursive stochastic
algorithms (RSAs). RSAs use randomization to efficiently compute expectations,
and so their iterates form a stochastic process. The key idea of our analysis
is to lift the RSA into an appropriate higher-dimensional space and then
express it as an equivalent Markov chain. Instead of determining the
convergence of this Markov chain (which may not converge under constant
stepsize), we study the convergence of the distribution of this Markov chain.
To study this, we define a new notion of Wasserstein divergence. We show that
if the distribution of the iterates in the Markov chain satisfy a contraction
property with respect to the Wasserstein divergence, then the Markov chain
admits an invariant distribution. We show that convergence of a large family of
constant stepsize RSAs can be understood using this framework, and we provide
several detailed examples.
Related papers
- Uncertainty quantification for Markov chains with application to temporal difference learning [63.49764856675643]
We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains.
We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
arXiv Detail & Related papers (2025-02-19T15:33:55Z) - Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.
We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.
Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - Deep Learning for Computing Convergence Rates of Markov Chains [0.0]
Deep Contractive Drift Calculator (DCDC) is first general-purpose sample-based algorithm for bounding the convergence of Markov chains to stationarity in Wasserstein distance.
We show that DCDC can generate convergence bounds for realistic Markov chains arising from processing networks as well as constant step-size optimization.
arXiv Detail & Related papers (2024-05-30T19:26:51Z) - Accelerating Distributed Stochastic Optimization via Self-Repellent
Random Walks [11.3631620309434]
We study a family of distributed optimization algorithms where gradients are sampled by a token traversing a network of agents in random-walk fashion.
We take a novel approach by replacing the standard linear Markovian token by one which follows a nonlinear Markov chain - namely the Self-Repellent Radom Walk (SRRW)
We prove that the optimization errors of the resulting SA-SRRW converge to zero almost surely and prove a central limit theorem.
arXiv Detail & Related papers (2024-01-18T00:50:37Z) - Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting [64.0722630873758]
We consider rather general and broad class of Markov chains, Ito chains, that look like Euler-Maryama discretization of some Differential Equation.
We prove the bound in $W_2$-distance between the laws of our Ito chain and differential equation.
arXiv Detail & Related papers (2023-10-09T18:38:56Z) - Covariate shift in nonparametric regression with Markovian design [0.0]
We show that convergence rates for a smoothness risk of a Nadaraya-Watson kernel estimator are determined by the similarity between the invariant distributions associated to source and target Markov chains.
We extend the notion of a distribution exponent from Kpotufe and Martinet to kernel transfer exponents of uniformly ergodic Markov chains.
arXiv Detail & Related papers (2023-07-17T14:24:27Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Rosenthal-type inequalities for linear statistics of Markov chains [20.606986885851573]
We establish novel deviation bounds for additive functionals of geometrically ergodic Markov chains.
We pay special attention to the dependence of our bounds on the mixing time of the corresponding chain.
arXiv Detail & Related papers (2023-03-10T10:24:46Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z) - Probabilistic Contraction Analysis of Iterated Random Operators [10.442391859219807]
Banach contraction mapping theorem is employed to establish the convergence of certain deterministic algorithms.
In a class of randomized algorithms, in each iteration, the contraction map is approximated with an operator that uses independent and identically distributed samples of certain random variables.
This leads to iterated random operators acting on an initial point in a complete metric space, and it generates a Markov chain.
arXiv Detail & Related papers (2018-04-04T00:10:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.