Finite-Time Analysis of Fully Decentralized Single-Timescale
Actor-Critic
- URL: http://arxiv.org/abs/2206.05733v1
- Date: Sun, 12 Jun 2022 13:14:14 GMT
- Title: Finite-Time Analysis of Fully Decentralized Single-Timescale
Actor-Critic
- Authors: Qijun Luo, Xiao Li
- Abstract summary: We introduce a fully decentralized Actor-Critic (AC) algorithm, where actor, critic, and global reward estimator are updated in an alternating manner.
We show that our algorithm has sample complexity of $tildemathcalO(epsilon-2)$ under Markovian sampling.
We also provide a local action privacy-preserving version of our algorithm and its analysis.
- Score: 4.94128206910124
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decentralized Actor-Critic (AC) algorithms have been widely utilized for
multi-agent reinforcement learning (MARL) and have achieved remarkable success.
Apart from its empirical success, the theoretical convergence property of
decentralized AC algorithms is largely unexplored. The existing finite-time
convergence results are derived based on either double-loop update or
two-timescale step sizes rule, which is not often adopted in real
implementation. In this work, we introduce a fully decentralized AC algorithm,
where actor, critic, and global reward estimator are updated in an alternating
manner with step sizes being of the same order, namely, we adopt the
\emph{single-timescale} update. Theoretically, using linear approximation for
value and reward estimation, we show that our algorithm has sample complexity
of $\tilde{\mathcal{O}}(\epsilon^{-2})$ under Markovian sampling, which matches
the optimal complexity with double-loop implementation (here,
$\tilde{\mathcal{O}}$ hides a log term). The sample complexity can be improved
to ${\mathcal{O}}(\epsilon^{-2})$ under the i.i.d. sampling scheme. The central
to establishing our complexity results is \emph{the hidden smoothness of the
optimal critic variable} we revealed. We also provide a local action
privacy-preserving version of our algorithm and its analysis. Finally, we
conduct experiments to show the superiority of our algorithm over the existing
decentralized AC algorithms.
Related papers
- Improved Sample Complexity for Global Convergence of Actor-Critic Algorithms [49.19842488693726]
We establish the global convergence of the actor-critic algorithm with a significantly improved sample complexity of $O(epsilon-3)$.
Our findings provide theoretical support for many algorithms that rely on constant step sizes.
arXiv Detail & Related papers (2024-10-11T14:46:29Z) - Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic
Approximation with Markovian Noise [9.82187447690297]
We characterize a tight convergence bound for the iterations of linear two-time-scale approximation (SA) with Markovian noise.
Our results can be applied to establish the convergence behavior of a variety of RL algorithms, such as TD-learning with Polyak averaging, GTD, and GTD2.
arXiv Detail & Related papers (2023-12-31T01:30:14Z) - Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs [60.40452803295326]
We propose a new reward-free algorithm for learning linear Markov decision processes (MDPs)
At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward.
We show that our algorithm only needs to explore $tilde O( d2varepsilon-2)$ episodes to find an $varepsilon$-optimal policy.
arXiv Detail & Related papers (2023-03-17T17:53:28Z) - A One-Sample Decentralized Proximal Algorithm for Non-Convex Stochastic
Composite Optimization [10.762749887051546]
We propose two-time scale algorithms: ProxDAS-A and Proxcal$DASA-GT.
Unlike prior work, our algorithms achieve comparable complexity without requiring large batch sizes, more complex per-it operations, or stronger assumptions.
arXiv Detail & Related papers (2023-02-20T05:16:18Z) - Sample and Communication-Efficient Decentralized Actor-Critic Algorithms
with Finite-Time Analysis [27.21581944906418]
Actor-critic (AC) algorithms have been widely adopted in decentralized multi-agent systems.
We develop two decentralized AC and natural AC (NAC) algorithms that are private, and sample and communication-efficient.
arXiv Detail & Related papers (2021-09-08T15:02:21Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z) - Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms [58.57004511121862]
This paper characterizes the convergence rate and sample complexity of AC and NAC under Markovian sampling.
We show that AC and NAC attain orderwise performance improvement over PG and NPG under infinite horizon due to the incorporation of critic.
arXiv Detail & Related papers (2020-04-27T17:11:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.