Decentralized Policy Optimization
- URL: http://arxiv.org/abs/2211.03032v1
- Date: Sun, 6 Nov 2022 05:38:23 GMT
- Title: Decentralized Policy Optimization
- Authors: Kefan Su and Zongqing Lu
- Abstract summary: We propose textitdecentralized policy optimization (DPO), a decentralized actor-critic algorithm with monotonic improvement and convergence guarantee.
Empirically, we compare DPO with IPPO in a variety of cooperative multi-agent tasks, covering discrete and continuous action spaces, and fully and partially observable environments.
- Score: 21.59254848913971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The study of decentralized learning or independent learning in cooperative
multi-agent reinforcement learning has a history of decades. Recently empirical
studies show that independent PPO (IPPO) can obtain good performance, close to
or even better than the methods of centralized training with decentralized
execution, in several benchmarks. However, decentralized actor-critic with
convergence guarantee is still open. In this paper, we propose
\textit{decentralized policy optimization} (DPO), a decentralized actor-critic
algorithm with monotonic improvement and convergence guarantee. We derive a
novel decentralized surrogate for policy optimization such that the monotonic
improvement of joint policy can be guaranteed by each agent
\textit{independently} optimizing the surrogate. In practice, this
decentralized surrogate can be realized by two adaptive coefficients for policy
optimization at each agent. Empirically, we compare DPO with IPPO in a variety
of cooperative multi-agent tasks, covering discrete and continuous action
spaces, and fully and partially observable environments. The results show DPO
outperforms IPPO in most tasks, which can be the evidence for our theoretical
results.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.