A Finite Time Analysis of Two Time-Scale Actor Critic Methods
- URL: http://arxiv.org/abs/2005.01350v3
- Date: Mon, 10 Oct 2022 06:43:56 GMT
- Title: A Finite Time Analysis of Two Time-Scale Actor Critic Methods
- Authors: Yue Wu and Weitong Zhang and Pan Xu and Quanquan Gu
- Abstract summary: We provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i.i.d. setting.
We prove that the actor-critic method is guaranteed to find a first-order stationary point.
This is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.
- Score: 87.69128666220016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Actor-critic (AC) methods have exhibited great empirical success compared
with other reinforcement learning algorithms, where the actor uses the policy
gradient to improve the learning policy and the critic uses temporal difference
learning to estimate the policy gradient. Under the two time-scale learning
rate schedule, the asymptotic convergence of AC has been well studied in the
literature. However, the non-asymptotic convergence and finite sample
complexity of actor-critic methods are largely open. In this work, we provide a
non-asymptotic analysis for two time-scale actor-critic methods under
non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find
a first-order stationary point (i.e., $\|\nabla J(\boldsymbol{\theta})\|_2^2
\le \epsilon$) of the non-concave performance function
$J(\boldsymbol{\theta})$, with $\mathcal{\tilde{O}}(\epsilon^{-2.5})$ sample
complexity. To the best of our knowledge, this is the first work providing
finite-time analysis and sample complexity bound for two time-scale
actor-critic methods.
Related papers
- Improved Sample Complexity for Global Convergence of Actor-Critic Algorithms [49.19842488693726]
We establish the global convergence of the actor-critic algorithm with a significantly improved sample complexity of $O(epsilon-3)$.
Our findings provide theoretical support for many algorithms that rely on constant step sizes.
arXiv Detail & Related papers (2024-10-11T14:46:29Z) - Non-Asymptotic Analysis for Single-Loop (Natural) Actor-Critic with Compatible Function Approximation [18.77565744533582]
Actor-critic (AC) is a powerful method for learning an optimal policy in reinforcement learning.
AC converges to an $epsilon+varepsilon_textcritic$ neighborhood of stationary points with the best known sample complexity.
This paper analyzes the convergence of both AC and NAC algorithms with compatible function approximation.
arXiv Detail & Related papers (2024-06-03T20:05:04Z) - Two-Timescale Critic-Actor for Average Reward MDPs with Function Approximation [5.945710235932345]
We present the first two-timescale critic-actor algorithm with function approximation in the long-run average reward setting.
A notable feature of our analysis is that unlike recent single-timescale actor-critic algorithms, we present a complete convergence analysis of our scheme.
arXiv Detail & Related papers (2024-02-02T12:48:49Z) - Finite-time analysis of single-timescale actor-critic [8.994243376183658]
Actor-critic methods have achieved significant success in many challenging applications.
finite-time convergence is still poorly understood in the most practical single-timescale form.
We investigate the more practical online single-timescale actor-critic algorithm on continuous state space.
arXiv Detail & Related papers (2022-10-18T15:03:56Z) - Actor-Critic or Critic-Actor? A Tale of Two Time Scales [5.945710235932345]
We provide a proof of convergence and compare the two empirically with and without function approximation.
Our proposed critic-actor algorithm performs on par with actor-critic in terms of both accuracy and computational effort.
arXiv Detail & Related papers (2022-10-10T07:47:56Z) - A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms [81.01917016753644]
We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective.
Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $gammat$ term in the actor update for the transition observed at time $t$ in a trajectory.
Practitioners, however, usually ignore the discounting ($gammat$) for the actor while using a discounted critic.
arXiv Detail & Related papers (2020-10-02T15:51:48Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy [122.01837436087516]
We study the global convergence and global optimality of actor-critic, one of the most popular families of reinforcement learning algorithms.
We establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time.
arXiv Detail & Related papers (2020-08-02T14:01:49Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.