Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise
- URL: http://arxiv.org/abs/2401.00364v2
- Date: Sun, 11 May 2025 19:55:30 GMT
- Title: Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise
- Authors: Shaan Ul Haque, Sajad Khodadadian, Siva Theja Maguluri,
- Abstract summary: We derive an upper bound on the error for the iterations of linear two-time-scale SA with Markovian noise.<n>A special case of linear two-time-scale SA that is extensively studied is linear SA with Polyak-Ruppert averaging.
- Score: 8.739101659113157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stochastic approximation (SA) is an iterative algorithm for finding the fixed point of an operator using noisy samples and widely used in optimization and Reinforcement Learning (RL). The noise in RL exhibits a Markovian structure, and in some cases, such as gradient temporal difference (GTD) methods, SA is employed in a two-time-scale framework. This combination introduces significant theoretical challenges for analysis. We derive an upper bound on the error for the iterations of linear two-time-scale SA with Markovian noise. We demonstrate that the mean squared error decreases as $trace (\Sigma^y)/k + o(1/k)$ where $k$ is the number of iterates, and $\Sigma^y$ is an appropriately defined covariance matrix. A key feature of our bounds is that the leading term, $\Sigma^y$, exactly matches with the covariance in the Central Limit Theorem (CLT) for the two-time-scale SA, and we call them tight finite-time bounds. We illustrate their use in RL by establishing sample complexity for off-policy algorithms, TDC, GTD, and GTD2. A special case of linear two-time-scale SA that is extensively studied is linear SA with Polyak-Ruppert averaging. We present tight finite time bounds corresponding to the covariance matrix of the CLT. Such bounds can be used to study TD-learning with Polyak-Ruppert averaging.
Related papers
- $O(1/k)$ Finite-Time Bound for Non-Linear Two-Time-Scale Stochastic Approximation [0.0]
We obtain an improved gradient bound of $O (1/k)$ for nonlinear two-time-scale approximations.<n>Our result applies to algorithms such as descent-ascent and two-time-scale Lagrangian optimization.
arXiv Detail & Related papers (2025-04-27T22:45:00Z) - Finite-Time Bounds for Two-Time-Scale Stochastic Approximation with Arbitrary Norm Contractions and Markovian Noise [7.770605097524015]
Two-time-scale Approximation (SA) is an iterative algorithm with applications in reinforcement learning and optimization.<n>Motivated by applications in reinforcement learning, we give the first mean square bound on non linear two-time-scale SA.
arXiv Detail & Related papers (2025-03-24T07:03:23Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates
of Linear Stochastic Approximation [22.51165277694864]
This paper provides a finite-time analysis of linear approximation (LSA) algorithms with fixed step size.
LSA is used to compute approximate solutions of a $d$-dimensional linear system.
arXiv Detail & Related papers (2022-07-10T14:36:04Z) - A Single-Timescale Analysis For Stochastic Approximation With Multiple
Coupled Sequences [21.50207156675195]
We study the finite-time convergence of nonlinear approximation with multiple coupled sequences.
At the heart of our analysis is the smoothness property of the fixed points in multi-sequence SA that holds in many applications.
arXiv Detail & Related papers (2022-06-21T14:13:20Z) - Finite-Time Analysis of Fully Decentralized Single-Timescale
Actor-Critic [4.94128206910124]
We introduce a fully decentralized Actor-Critic (AC) algorithm, where actor, critic, and global reward estimator are updated in an alternating manner.
We show that our algorithm has sample complexity of $tildemathcalO(epsilon-2)$ under Markovian sampling.
We also provide a local action privacy-preserving version of our algorithm and its analysis.
arXiv Detail & Related papers (2022-06-12T13:14:14Z) - Gradient Temporal Difference with Momentum: Stability and Convergence [4.780898954294901]
We decompose the heavy ball Gradient TD algorithm into three separate iterates with different step sizes.
We prove that the heavy ball Gradient TD algorithm is convergent using our three-timescale SA analysis.
arXiv Detail & Related papers (2021-11-22T06:17:11Z) - Sample Complexity Bounds for Two Timescale Value-based Reinforcement
Learning Algorithms [65.09383385484007]
Two timescale approximation (SA) has been widely used in value-based reinforcement learning algorithms.
We study the non-asymptotic convergence rate of two timescale linear and nonlinear TDC and Greedy-GQ algorithms.
arXiv Detail & Related papers (2020-11-10T11:36:30Z) - Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence
Analysis [27.679514676804057]
We develop a variance reduction scheme for the two time-scale TDC algorithm in the off-policy setting.
Experiments demonstrate that the proposed variance-reduced TDC achieves a smaller convergence error than both the conventional TDC and the variance-reduced TD.
arXiv Detail & Related papers (2020-10-26T01:33:05Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Beyond Worst-Case Analysis in Stochastic Approximation: Moment
Estimation Improves Instance Complexity [58.70807593332932]
We study oracle complexity of gradient based methods for approximation problems.
We focus on instance-dependent complexity instead of worst case complexity.
Our proposed algorithm and its analysis provide a theoretical justification for the success of moment estimation.
arXiv Detail & Related papers (2020-06-08T09:25:47Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z) - Stochastic Approximation with Markov Noise: Analysis and applications in
reinforcement learning [0.0]
We present for the first time an convergence analysis of two time-scale approximation driven by "controlled" Markov noise.
We analyze the behavior of our framework by relating it to limiting differential inclusions in both time scales.
We obtain the first informative error bounds on function approximation for the policy evaluation algorithm.
arXiv Detail & Related papers (2020-04-08T03:59:21Z) - Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP)
We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z) - Non-asymptotic Convergence of Adam-type Reinforcement Learning
Algorithms under Markovian Sampling [56.394284787780364]
This paper provides the first theoretical convergence analysis for two fundamental RL algorithms of policy gradient (PG) and temporal difference (TD) learning.
Under general nonlinear function approximation, PG-AMSGrad with a constant stepsize converges to a neighborhood of a stationary point at the rate of $mathcalO(log T/sqrtT)$.
Under linear function approximation, TD-AMSGrad with a constant stepsize converges to a neighborhood of the global optimum at the rate of $mathcalO(log T/sqrtT
arXiv Detail & Related papers (2020-02-15T00:26:49Z) - Finite Time Analysis of Linear Two-timescale Stochastic Approximation
with Markovian Noise [28.891930079358954]
We provide a finite-time analysis for linear two timescale SA scheme.
Our bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise.
We present an expansion of the expected error with a matching lower bound.
arXiv Detail & Related papers (2020-02-04T13:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.