Finite Time Analysis of Linear Two-timescale Stochastic Approximation
with Markovian Noise
- URL: http://arxiv.org/abs/2002.01268v1
- Date: Tue, 4 Feb 2020 13:03:17 GMT
- Title: Finite Time Analysis of Linear Two-timescale Stochastic Approximation
with Markovian Noise
- Authors: Maxim Kaledin, Eric Moulines, Alexey Naumov, Vladislav Tadic, Hoi-To
Wai
- Abstract summary: We provide a finite-time analysis for linear two timescale SA scheme.
Our bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise.
We present an expansion of the expected error with a matching lower bound.
- Score: 28.891930079358954
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linear two-timescale stochastic approximation (SA) scheme is an important
class of algorithms which has become popular in reinforcement learning (RL),
particularly for the policy evaluation problem. Recently, a number of works
have been devoted to establishing the finite time analysis of the scheme,
especially under the Markovian (non-i.i.d.) noise settings that are ubiquitous
in practice. In this paper, we provide a finite-time analysis for linear two
timescale SA. Our bounds show that there is no discrepancy in the convergence
rate between Markovian and martingale noise, only the constants are affected by
the mixing time of the Markov chain. With an appropriate step size schedule,
the transient term in the expected error bound is $o(1/k^c)$ and the
steady-state term is ${\cal O}(1/k)$, where $c>1$ and $k$ is the iteration
number. Furthermore, we present an asymptotic expansion of the expected error
with a matching lower bound of $\Omega(1/k)$. A simple numerical experiment is
presented to support our theory.
Related papers
- Two-Timescale Linear Stochastic Approximation: Constant Stepsizes Go a Long Way [12.331596909999764]
We investigate it constant stpesize schemes through the lens of Markov processes.
We derive explicit geometric and non-asymptotic convergence rates, as well as the variance and bias introduced by constant stepsizes.
arXiv Detail & Related papers (2024-10-16T21:49:27Z) - Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling.
Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates
of Linear Stochastic Approximation [22.51165277694864]
This paper provides a finite-time analysis of linear approximation (LSA) algorithms with fixed step size.
LSA is used to compute approximate solutions of a $d$-dimensional linear system.
arXiv Detail & Related papers (2022-07-10T14:36:04Z) - Optimal and instance-dependent guarantees for Markovian linear stochastic approximation [47.912511426974376]
We show a non-asymptotic bound of the order $t_mathrmmix tfracdn$ on the squared error of the last iterate of a standard scheme.
We derive corollaries of these results for policy evaluation with Markov noise.
arXiv Detail & Related papers (2021-12-23T18:47:50Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - On the Convergence of Consensus Algorithms with Markovian Noise and
Gradient Bias [25.775517797956237]
This paper presents a finite time convergence analysis for a decentralized approximation (SA) scheme.
The scheme generalizes several algorithms for decentralized machine learning and multi- reinforcement learning.
arXiv Detail & Related papers (2020-08-18T10:29:33Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z) - Stochastic Approximation with Markov Noise: Analysis and applications in
reinforcement learning [0.0]
We present for the first time an convergence analysis of two time-scale approximation driven by "controlled" Markov noise.
We analyze the behavior of our framework by relating it to limiting differential inclusions in both time scales.
We obtain the first informative error bounds on function approximation for the policy evaluation algorithm.
arXiv Detail & Related papers (2020-04-08T03:59:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.