Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic
Approximation with Markovian Noise
- URL: http://arxiv.org/abs/2401.00364v1
- Date: Sun, 31 Dec 2023 01:30:14 GMT
- Title: Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic
Approximation with Markovian Noise
- Authors: Shaan Ul Haque, Sajad Khodadadian, Siva Theja Maguluri
- Abstract summary: We characterize a tight convergence bound for the iterations of linear two-time-scale approximation (SA) with Markovian noise.
Our results can be applied to establish the convergence behavior of a variety of RL algorithms, such as TD-learning with Polyak averaging, GTD, and GTD2.
- Score: 9.82187447690297
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stochastic approximation (SA) is an iterative algorithm to find the fixed
point of an operator given noisy samples of this operator. SA appears in many
areas such as optimization and Reinforcement Learning (RL). When implemented in
practice, the noise that appears in the update of RL algorithms is naturally
Markovian. Furthermore, in some settings, such as gradient TD, SA is employed
in a two-time-scale manner. The mix of Markovian noise along with the
two-time-scale structure results in an algorithm which is complex to analyze
theoretically. In this paper, we characterize a tight convergence bound for the
iterations of linear two-time-scale SA with Markovian noise. Our results show
the convergence behavior of this algorithm given various choices of step sizes.
Applying our result to the well-known TDC algorithm, we show the first
$O(1/\epsilon)$ sample complexity for the convergence of this algorithm,
outperforming all the previous work. Similarly, our results can be applied to
establish the convergence behavior of a variety of RL algorithms, such as
TD-learning with Polyak averaging, GTD, and GTD2.
Related papers
- Finite-Time Analysis of Fully Decentralized Single-Timescale
Actor-Critic [4.94128206910124]
We introduce a fully decentralized Actor-Critic (AC) algorithm, where actor, critic, and global reward estimator are updated in an alternating manner.
We show that our algorithm has sample complexity of $tildemathcalO(epsilon-2)$ under Markovian sampling.
We also provide a local action privacy-preserving version of our algorithm and its analysis.
arXiv Detail & Related papers (2022-06-12T13:14:14Z) - Gradient Temporal Difference with Momentum: Stability and Convergence [4.780898954294901]
We decompose the heavy ball Gradient TD algorithm into three separate iterates with different step sizes.
We prove that the heavy ball Gradient TD algorithm is convergent using our three-timescale SA analysis.
arXiv Detail & Related papers (2021-11-22T06:17:11Z) - Sample Complexity Bounds for Two Timescale Value-based Reinforcement
Learning Algorithms [65.09383385484007]
Two timescale approximation (SA) has been widely used in value-based reinforcement learning algorithms.
We study the non-asymptotic convergence rate of two timescale linear and nonlinear TDC and Greedy-GQ algorithms.
arXiv Detail & Related papers (2020-11-10T11:36:30Z) - Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence
Analysis [27.679514676804057]
We develop a variance reduction scheme for the two time-scale TDC algorithm in the off-policy setting.
Experiments demonstrate that the proposed variance-reduced TDC achieves a smaller convergence error than both the conventional TDC and the variance-reduced TD.
arXiv Detail & Related papers (2020-10-26T01:33:05Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Beyond Worst-Case Analysis in Stochastic Approximation: Moment
Estimation Improves Instance Complexity [58.70807593332932]
We study oracle complexity of gradient based methods for approximation problems.
We focus on instance-dependent complexity instead of worst case complexity.
Our proposed algorithm and its analysis provide a theoretical justification for the success of moment estimation.
arXiv Detail & Related papers (2020-06-08T09:25:47Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z) - Stochastic Approximation with Markov Noise: Analysis and applications in
reinforcement learning [0.0]
We present for the first time an convergence analysis of two time-scale approximation driven by "controlled" Markov noise.
We analyze the behavior of our framework by relating it to limiting differential inclusions in both time scales.
We obtain the first informative error bounds on function approximation for the policy evaluation algorithm.
arXiv Detail & Related papers (2020-04-08T03:59:21Z) - Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP)
We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z) - Non-asymptotic Convergence of Adam-type Reinforcement Learning
Algorithms under Markovian Sampling [56.394284787780364]
This paper provides the first theoretical convergence analysis for two fundamental RL algorithms of policy gradient (PG) and temporal difference (TD) learning.
Under general nonlinear function approximation, PG-AMSGrad with a constant stepsize converges to a neighborhood of a stationary point at the rate of $mathcalO(log T/sqrtT)$.
Under linear function approximation, TD-AMSGrad with a constant stepsize converges to a neighborhood of the global optimum at the rate of $mathcalO(log T/sqrtT
arXiv Detail & Related papers (2020-02-15T00:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.