Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with
General Utilities
- URL: http://arxiv.org/abs/2305.17568v1
- Date: Sat, 27 May 2023 20:08:35 GMT
- Title: Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with
General Utilities
- Authors: Donghao Ying, Yunkai Zhang, Yuhao Ding, Alec Koppel, Javad Lavaei
- Abstract summary: We investigate safe multi-agent reinforcement learning, where agents seek to collectively maximize an aggregate sum of local objectives while satisfying their own safety constraints.
Our algorithm converges to a first-order stationary point (FOSP) at the rate of $mathcalOleft(T-2/3right)$.
In the sample-based setting, we demonstrate that, with high probability, our algorithm requires $widetildemathcalOleft(epsilon-3.5right)$ samples to achieve an $epsilon$-FOSP.
- Score: 12.104551746465932
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We investigate safe multi-agent reinforcement learning, where agents seek to
collectively maximize an aggregate sum of local objectives while satisfying
their own safety constraints. The objective and constraints are described by
{\it general utilities}, i.e., nonlinear functions of the long-term
state-action occupancy measure, which encompass broader decision-making goals
such as risk, exploration, or imitations. The exponential growth of the
state-action space size with the number of agents presents challenges for
global observability, further exacerbated by the global coupling arising from
agents' safety constraints. To tackle this issue, we propose a primal-dual
method utilizing shadow reward and $\kappa$-hop neighbor truncation under a
form of correlation decay property, where $\kappa$ is the communication radius.
In the exact setting, our algorithm converges to a first-order stationary point
(FOSP) at the rate of $\mathcal{O}\left(T^{-2/3}\right)$. In the sample-based
setting, we demonstrate that, with high probability, our algorithm requires
$\widetilde{\mathcal{O}}\left(\epsilon^{-3.5}\right)$ samples to achieve an
$\epsilon$-FOSP with an approximation error of $\mathcal{O}(\phi_0^{2\kappa})$,
where $\phi_0\in (0,1)$. Finally, we demonstrate the effectiveness of our model
through extensive numerical experiments.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.