Related papers: Finite-Time Analysis of Stochastic Gradient Descent under Markov Randomness

Finite-Time Analysis of Stochastic Gradient Descent under Markov Randomness

URL: http://arxiv.org/abs/2003.10973v2
Date: Wed, 1 Apr 2020 17:50:14 GMT
Title: Finite-Time Analysis of Stochastic Gradient Descent under Markov Randomness
Authors: Thinh T. Doan, Lam M. Nguyen, Nhan H. Pham, Justin Romberg
Abstract summary: gradient descent (SGD) is used in reinforcement learning and machine learning. We show that SGD converges nearly at the same rate with Markovian gradient samples as with independent gradient samples.
Score: 27.027583559295365
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivated by broad applications in reinforcement learning and machine learning, this paper considers the popular stochastic gradient descent (SGD) when the gradients of the underlying objective function are sampled from Markov processes. This Markov sampling leads to the gradient samples being biased and not independent. The existing results for the convergence of SGD under Markov randomness are often established under the assumptions on the boundedness of either the iterates or the gradient samples. Our main focus is to study the finite-time convergence of SGD for different types of objective functions, without requiring these assumptions. We show that SGD converges nearly at the same rate with Markovian gradient samples as with independent gradient samples. The only difference is a logarithmic factor that accounts for the mixing time of the Markov chain.

Related papers

Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework. We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points. Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z)
Stochastic-Constrained Stochastic Optimization with Markovian Data [2.1756081703276]
We study the setting where data samples are drawn from a Markov chain and thus are not independent and identically distributed. We propose two variants of drift-plus-penalty; one is for the case when the mixing time of the underlying Markov chain is known. Our algorithms apply to a more general setting of constrained online convex optimization where the sequence of constraint functions follows a Markov chain.
arXiv Detail & Related papers (2023-12-07T14:09:27Z)
Online covariance estimation for stochastic gradient descent under Markovian sampling [20.02012768403544]
Convergence rates of order $Obig(sqrtd,n-1/8(log n)1/4big)$ are established under state-dependent and state-independent Markovian sampling. Our method is applied to the strategic classification with logistic regression, where adversaries adaptively modify features during training to affect target class classification.
arXiv Detail & Related papers (2023-08-03T00:21:30Z)
Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution. We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z)
Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability [85.1044381834036]
We investigate the implicit regularization effects of label noises under mini-batch sampling settings of gradient descent. We find such implicit regularizer would favor some convergence points that could stabilize model outputs against perturbation of parameters. Our work doesn't assume SGD as an Ornstein-Uhlenbeck like process and achieve a more general result with convergence of approximation proved.
arXiv Detail & Related papers (2023-04-01T14:09:07Z)
Stochastic Gradient Descent under Markovian Sampling Schemes [3.04585143845864]
We study a variation of vanilla gradient descent where the only has access to a Markovian sampling scheme. We focus on obtaining rates of convergence under the least restrictive assumptions possible on the underlying Markov chain.
arXiv Detail & Related papers (2023-02-28T09:18:00Z)
Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain. We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions. We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z)
Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients [7.461223697336269]
Minimizing the inclusive Kullback-Leibler divergence with gradient descent (SGD) is challenging since its gradient is defined as an integral over the posterior. This paper provides the first non-asymptotic convergence analysis of these methods by establishing their mixing rate and gradient variance.
arXiv Detail & Related papers (2022-06-13T16:25:08Z)
Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm. We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z)
Targeted stochastic gradient Markov chain Monte Carlo for hidden Markov models with rare latent states [48.705095800341944]
Markov chain Monte Carlo (MCMC) algorithms for hidden Markov models often rely on the forward-backward sampler. This makes them computationally slow as the length of the time series increases, motivating the development of sub-sampling-based approaches. We propose a targeted sub-sampling approach that over-samples observations corresponding to rare latent states when calculating the gradient of parameters associated with them.
arXiv Detail & Related papers (2018-10-31T17:44:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.