Finite-time analysis of single-timescale actor-critic
- URL: http://arxiv.org/abs/2210.09921v4
- Date: Fri, 26 Jan 2024 08:06:09 GMT
- Title: Finite-time analysis of single-timescale actor-critic
- Authors: Xuyang Chen, Lin Zhao
- Abstract summary: Actor-critic methods have achieved significant success in many challenging applications.
finite-time convergence is still poorly understood in the most practical single-timescale form.
We investigate the more practical online single-timescale actor-critic algorithm on continuous state space.
- Score: 8.994243376183658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Actor-critic methods have achieved significant success in many challenging
applications. However, its finite-time convergence is still poorly understood
in the most practical single-timescale form. Existing works on analyzing
single-timescale actor-critic have been limited to i.i.d. sampling or tabular
setting for simplicity. We investigate the more practical online
single-timescale actor-critic algorithm on continuous state space, where the
critic assumes linear function approximation and updates with a single
Markovian sample per actor step. Previous analysis has been unable to establish
the convergence for such a challenging scenario. We demonstrate that the online
single-timescale actor-critic method provably finds an $\epsilon$-approximate
stationary point with $\widetilde{\mathcal{O}}(\epsilon^{-2})$ sample
complexity under standard assumptions, which can be further improved to
$\mathcal{O}(\epsilon^{-2})$ under the i.i.d. sampling. Our novel framework
systematically evaluates and controls the error propagation between the actor
and critic. It offers a promising approach for analyzing other single-timescale
reinforcement learning algorithms as well.
Related papers
- Improved Sample Complexity for Global Convergence of Actor-Critic Algorithms [49.19842488693726]
We establish the global convergence of the actor-critic algorithm with a significantly improved sample complexity of $O(epsilon-3)$.
Our findings provide theoretical support for many algorithms that rely on constant step sizes.
arXiv Detail & Related papers (2024-10-11T14:46:29Z) - Non-Asymptotic Analysis for Single-Loop (Natural) Actor-Critic with Compatible Function Approximation [18.77565744533582]
Actor-critic (AC) is a powerful method for learning an optimal policy in reinforcement learning.
AC converges to an $epsilon+varepsilon_textcritic$ neighborhood of stationary points with the best known sample complexity.
This paper analyzes the convergence of both AC and NAC algorithms with compatible function approximation.
arXiv Detail & Related papers (2024-06-03T20:05:04Z) - Global Convergence of Two-timescale Actor-Critic for Solving Linear
Quadratic Regulator [43.13238243240668]
We develop a new analysis framework that allows establishing the global convergence to an $epsilon$-optimal solution.
This is the first finite-time convergence analysis for the single sample two-timescale AC for solving LQR with global optimality.
arXiv Detail & Related papers (2022-08-18T09:57:03Z) - A Small Gain Analysis of Single Timescale Actor Critic [16.092248433189816]
We consider a version of actor-critic which uses proportional step-sizes and only one critic update with a single sample from the stationary distribution per actor step.
We prove that this method can be used to find a stationary point, and that the resulting sample complexity improves the state of the art for actor-critic methods.
arXiv Detail & Related papers (2022-03-04T22:20:34Z) - A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms [81.01917016753644]
We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective.
Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $gammat$ term in the actor update for the transition observed at time $t$ in a trajectory.
Practitioners, however, usually ignore the discounting ($gammat$) for the actor while using a discounted critic.
arXiv Detail & Related papers (2020-10-02T15:51:48Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z) - A Finite Time Analysis of Two Time-Scale Actor Critic Methods [87.69128666220016]
We provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i.i.d. setting.
We prove that the actor-critic method is guaranteed to find a first-order stationary point.
This is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.
arXiv Detail & Related papers (2020-05-04T09:45:18Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.