Improved Sample Complexity for Global Convergence of Actor-Critic Algorithms
- URL: http://arxiv.org/abs/2410.08868v1
- Date: Fri, 11 Oct 2024 14:46:29 GMT
- Title: Improved Sample Complexity for Global Convergence of Actor-Critic Algorithms
- Authors: Navdeep Kumar, Priyank Agrawal, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor,
- Abstract summary: We establish the global convergence of the actor-critic algorithm with a significantly improved sample complexity of $O(epsilon-3)$.
Our findings provide theoretical support for many algorithms that rely on constant step sizes.
- Score: 49.19842488693726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we establish the global convergence of the actor-critic algorithm with a significantly improved sample complexity of $O(\epsilon^{-3})$, advancing beyond the existing local convergence results. Previous works provide local convergence guarantees with a sample complexity of $O(\epsilon^{-2})$ for bounding the squared gradient of the return, which translates to a global sample complexity of $O(\epsilon^{-4})$ using the gradient domination lemma. In contrast to traditional methods that employ decreasing step sizes for both the actor and critic, we demonstrate that a constant step size for the critic is sufficient to ensure convergence in expectation. This key insight reveals that using a decreasing step size for the actor alone is sufficient to handle the noise for both the actor and critic. Our findings provide theoretical support for the practical success of many algorithms that rely on constant step sizes.
Related papers
- On the Global Convergence of Natural Actor-Critic with Two-layer Neural
Network Parametrization [38.32265770020665]
We study a natural actor-critic algorithm that utilizes neural networks to represent the critic.
Our aim is to establish sample complexity guarantees for this algorithm, achieving a deeper understanding of its performance characteristics.
arXiv Detail & Related papers (2023-06-18T06:22:04Z) - Global Convergence of Two-timescale Actor-Critic for Solving Linear
Quadratic Regulator [43.13238243240668]
We develop a new analysis framework that allows establishing the global convergence to an $epsilon$-optimal solution.
This is the first finite-time convergence analysis for the single sample two-timescale AC for solving LQR with global optimality.
arXiv Detail & Related papers (2022-08-18T09:57:03Z) - Finite-Time Analysis of Fully Decentralized Single-Timescale
Actor-Critic [4.94128206910124]
We introduce a fully decentralized Actor-Critic (AC) algorithm, where actor, critic, and global reward estimator are updated in an alternating manner.
We show that our algorithm has sample complexity of $tildemathcalO(epsilon-2)$ under Markovian sampling.
We also provide a local action privacy-preserving version of our algorithm and its analysis.
arXiv Detail & Related papers (2022-06-12T13:14:14Z) - Towards Sample-Optimal Compressive Phase Retrieval with Sparse and
Generative Priors [59.33977545294148]
We show that $O(k log L)$ samples suffice to guarantee that the signal is close to any vector that minimizes an amplitude-based empirical loss function.
We adapt this result to sparse phase retrieval, and show that $O(s log n)$ samples are sufficient for a similar guarantee when the underlying signal is $s$-sparse and $n$-dimensional.
arXiv Detail & Related papers (2021-06-29T12:49:54Z) - Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear
Function Approximation [5.543220407902113]
We develop a novel variant of off-policy natural actor-critic algorithm with linear function approximation.
We establish a sample complexity of $mathcalO(epsilon-3)$, outperforming all the previously known convergence bounds of such algorithms.
arXiv Detail & Related papers (2021-05-26T13:35:42Z) - Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP.
DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize.
We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z) - Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms [58.57004511121862]
This paper characterizes the convergence rate and sample complexity of AC and NAC under Markovian sampling.
We show that AC and NAC attain orderwise performance improvement over PG and NPG under infinite horizon due to the incorporation of critic.
arXiv Detail & Related papers (2020-04-27T17:11:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.