Global Convergence of Two-timescale Actor-Critic for Solving Linear
Quadratic Regulator
- URL: http://arxiv.org/abs/2208.08744v1
- Date: Thu, 18 Aug 2022 09:57:03 GMT
- Title: Global Convergence of Two-timescale Actor-Critic for Solving Linear
Quadratic Regulator
- Authors: Xuyang Chen, Jingliang Duan, Yingbin Liang, Lin Zhao
- Abstract summary: We develop a new analysis framework that allows establishing the global convergence to an $epsilon$-optimal solution.
This is the first finite-time convergence analysis for the single sample two-timescale AC for solving LQR with global optimality.
- Score: 43.13238243240668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The actor-critic (AC) reinforcement learning algorithms have been the
powerhouse behind many challenging applications. Nevertheless, its convergence
is fragile in general. To study its instability, existing works mostly consider
the uncommon double-loop variant or basic models with finite state and action
space. We investigate the more practical single-sample two-timescale AC for
solving the canonical linear quadratic regulator (LQR) problem, where the actor
and the critic update only once with a single sample in each iteration on an
unbounded continuous state and action space. Existing analysis cannot conclude
the convergence for such a challenging case. We develop a new analysis
framework that allows establishing the global convergence to an
$\epsilon$-optimal solution with at most an
$\tilde{\mathcal{O}}(\epsilon^{-2.5})$ sample complexity. To our knowledge,
this is the first finite-time convergence analysis for the single sample
two-timescale AC for solving LQR with global optimality. The sample complexity
improves those of other variants by orders, which sheds light on the practical
wisdom of single sample algorithms. We also further validate our theoretical
findings via comprehensive simulation comparisons.
Related papers
- Performative Reinforcement Learning with Linear Markov Decision Process [14.75815792682734]
We study the setting of emphperformative reinforcement learning where the deployed policy affects both the reward and the transition of the underlying Markov decision process.
We generalize the results to emphlinear Markov decision processes which is the primary theoretical model of large-scale MDPs.
arXiv Detail & Related papers (2024-11-07T23:04:48Z) - Improved Sample Complexity for Global Convergence of Actor-Critic Algorithms [49.19842488693726]
We establish the global convergence of the actor-critic algorithm with a significantly improved sample complexity of $O(epsilon-3)$.
Our findings provide theoretical support for many algorithms that rely on constant step sizes.
arXiv Detail & Related papers (2024-10-11T14:46:29Z) - Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity
and Last-Iterate Convergence [19.779044926914704]
Zero-sum Linear Quadratic (LQ) games are fundamental in optimal control.
In this work, we propose a simpler nested Zeroth-Order (NPG) algorithm.
arXiv Detail & Related papers (2023-09-08T11:47:31Z) - Finite-Time Analysis of Fully Decentralized Single-Timescale
Actor-Critic [4.94128206910124]
We introduce a fully decentralized Actor-Critic (AC) algorithm, where actor, critic, and global reward estimator are updated in an alternating manner.
We show that our algorithm has sample complexity of $tildemathcalO(epsilon-2)$ under Markovian sampling.
We also provide a local action privacy-preserving version of our algorithm and its analysis.
arXiv Detail & Related papers (2022-06-12T13:14:14Z) - Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP.
DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize.
We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z) - Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms [58.57004511121862]
This paper characterizes the convergence rate and sample complexity of AC and NAC under Markovian sampling.
We show that AC and NAC attain orderwise performance improvement over PG and NPG under infinite horizon due to the incorporation of critic.
arXiv Detail & Related papers (2020-04-27T17:11:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.