Global Optimality of Single-Timescale Actor-Critic under Continuous State-Action Space: A Study on Linear Quadratic Regulator
- URL: http://arxiv.org/abs/2505.01041v1
- Date: Fri, 02 May 2025 06:30:52 GMT
- Title: Global Optimality of Single-Timescale Actor-Critic under Continuous State-Action Space: A Study on Linear Quadratic Regulator
- Authors: Xuyang Chen, Jingliang Duan, Lin Zhao,
- Abstract summary: We show that the popular single-timescale actor-critic can attain an epsilon-optimal solution with an order of epsilon to -2 sample complexity.<n>Our work provides new insights into the performance of single-timescale actor-critic, which further bridges the gap between theory and practice.
- Score: 9.890337460455902
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Actor-critic methods have achieved state-of-the-art performance in various challenging tasks. However, theoretical understandings of their performance remain elusive and challenging. Existing studies mostly focus on practically uncommon variants such as double-loop or two-timescale stepsize actor-critic algorithms for simplicity. These results certify local convergence on finite state- or action-space only. We push the boundary to investigate the classic single-sample single-timescale actor-critic on continuous (infinite) state-action space, where we employ the canonical linear quadratic regulator (LQR) problem as a case study. We show that the popular single-timescale actor-critic can attain an epsilon-optimal solution with an order of epsilon to -2 sample complexity for solving LQR on the demanding continuous state-action space. Our work provides new insights into the performance of single-timescale actor-critic, which further bridges the gap between theory and practice.
Related papers
- Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints [49.76332265680669]
This paper examines a crucial subset of problems where both the objective and constraint functions are weakly convex.<n>Existing methods often face limitations, including slow convergence rates or reliance on double-loop designs.<n>We introduce a novel single-loop penalty-based algorithm to overcome these challenges.
arXiv Detail & Related papers (2025-04-21T17:15:48Z) - Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive.<n> LCD can distort the global distribution over strings, sampling tokens based only on local information.<n>We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z) - A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable Convergence [7.586600116278698]
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN)
arXiv Detail & Related papers (2023-06-10T10:04:54Z) - Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods.
By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z) - Finite-time analysis of single-timescale actor-critic [8.994243376183658]
Actor-critic methods have achieved significant success in many challenging applications.
finite-time convergence is still poorly understood in the most practical single-timescale form.
We investigate the more practical online single-timescale actor-critic algorithm on continuous state space.
arXiv Detail & Related papers (2022-10-18T15:03:56Z) - Global Convergence of Two-timescale Actor-Critic for Solving Linear
Quadratic Regulator [43.13238243240668]
We develop a new analysis framework that allows establishing the global convergence to an $epsilon$-optimal solution.
This is the first finite-time convergence analysis for the single sample two-timescale AC for solving LQR with global optimality.
arXiv Detail & Related papers (2022-08-18T09:57:03Z) - Single Time-scale Actor-critic Method to Solve the Linear Quadratic
Regulator with Convergence Guarantees [12.884132885360907]
We propose a single time-scale actor-critic algorithm to solve the linear quadratic regulator (LQR) problem.
A least squares temporal difference (LSTD) method is applied to the critic and a natural policy gradient method is used for the actor.
arXiv Detail & Related papers (2022-01-31T19:20:40Z) - Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy [122.01837436087516]
We study the global convergence and global optimality of actor-critic, one of the most popular families of reinforcement learning algorithms.
We establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time.
arXiv Detail & Related papers (2020-08-02T14:01:49Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z) - A Finite Time Analysis of Two Time-Scale Actor Critic Methods [87.69128666220016]
We provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i.i.d. setting.
We prove that the actor-critic method is guaranteed to find a first-order stationary point.
This is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.
arXiv Detail & Related papers (2020-05-04T09:45:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.