Related papers: Global Optimality of Single-Timescale Actor-Critic under Continuous State-Action Space: A Study on Linear Quadratic Regulator

Global Optimality of Single-Timescale Actor-Critic under Continuous State-Action Space: A Study on Linear Quadratic Regulator

URL: http://arxiv.org/abs/2505.01041v1
Date: Fri, 02 May 2025 06:30:52 GMT
Title: Global Optimality of Single-Timescale Actor-Critic under Continuous State-Action Space: A Study on Linear Quadratic Regulator
Authors: Xuyang Chen, Jingliang Duan, Lin Zhao,
Abstract summary: We show that the popular single-timescale actor-critic can attain an epsilon-optimal solution with an order of epsilon to -2 sample complexity.<n>Our work provides new insights into the performance of single-timescale actor-critic, which further bridges the gap between theory and practice.
Score: 9.890337460455902
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Actor-critic methods have achieved state-of-the-art performance in various challenging tasks. However, theoretical understandings of their performance remain elusive and challenging. Existing studies mostly focus on practically uncommon variants such as double-loop or two-timescale stepsize actor-critic algorithms for simplicity. These results certify local convergence on finite state- or action-space only. We push the boundary to investigate the classic single-sample single-timescale actor-critic on continuous (infinite) state-action space, where we employ the canonical linear quadratic regulator (LQR) problem as a case study. We show that the popular single-timescale actor-critic can attain an epsilon-optimal solution with an order of epsilon to -2 sample complexity for solving LQR on the demanding continuous state-action space. Our work provides new insights into the performance of single-timescale actor-critic, which further bridges the gap between theory and practice.

Related papers

Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum [62.691095807959215]
We establish an optimal sample complexity of $O(-2)$ for obtaining an $$-optimal global policy using a single-timescale actor-critic (AC) algorithm.<n>These mechanisms are compatible with existing deep learning architectures and require only minor modifications, without compromising practical applicability.
arXiv Detail & Related papers (2026-02-02T00:35:42Z)
Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints [49.76332265680669]
This paper examines a crucial subset of problems where both the objective and constraint functions are weakly convex.<n>Existing methods often face limitations, including slow convergence rates or reliance on double-loop designs.<n>We introduce a novel single-loop penalty-based algorithm to overcome these challenges.
arXiv Detail & Related papers (2025-04-21T17:15:48Z)
Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive.<n> LCD can distort the global distribution over strings, sampling tokens based only on local information.<n>We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable Convergence [7.586600116278698]
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN) Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN) Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN) Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN) Deep Actor-Critic network (DNN)
arXiv Detail & Related papers (2023-06-10T10:04:54Z)
Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods. By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z)
Finite-time analysis of single-timescale actor-critic [8.994243376183658]
Actor-critic methods have achieved significant success in many challenging applications. finite-time convergence is still poorly understood in the most practical single-timescale form. We investigate the more practical online single-timescale actor-critic algorithm on continuous state space.
arXiv Detail & Related papers (2022-10-18T15:03:56Z)
Global Convergence of Two-timescale Actor-Critic for Solving Linear Quadratic Regulator [43.13238243240668]
We develop a new analysis framework that allows establishing the global convergence to an $epsilon$-optimal solution. This is the first finite-time convergence analysis for the single sample two-timescale AC for solving LQR with global optimality.
arXiv Detail & Related papers (2022-08-18T09:57:03Z)
Single Time-scale Actor-critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees [12.884132885360907]
We propose a single time-scale actor-critic algorithm to solve the linear quadratic regulator (LQR) problem. A least squares temporal difference (LSTD) method is applied to the critic and a natural policy gradient method is used for the actor.
arXiv Detail & Related papers (2022-01-31T19:20:40Z)
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy [122.01837436087516]
We study the global convergence and global optimality of actor-critic, one of the most popular families of reinforcement learning algorithms. We establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time.
arXiv Detail & Related papers (2020-08-02T14:01:49Z)
Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies. We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point. We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z)
A Finite Time Analysis of Two Time-Scale Actor Critic Methods [87.69128666220016]
We provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point. This is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.
arXiv Detail & Related papers (2020-05-04T09:45:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.