Decentralized Policy Optimization
- URL: http://arxiv.org/abs/2211.03032v1
- Date: Sun, 6 Nov 2022 05:38:23 GMT
- Title: Decentralized Policy Optimization
- Authors: Kefan Su and Zongqing Lu
- Abstract summary: We propose textitdecentralized policy optimization (DPO), a decentralized actor-critic algorithm with monotonic improvement and convergence guarantee.
Empirically, we compare DPO with IPPO in a variety of cooperative multi-agent tasks, covering discrete and continuous action spaces, and fully and partially observable environments.
- Score: 21.59254848913971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The study of decentralized learning or independent learning in cooperative
multi-agent reinforcement learning has a history of decades. Recently empirical
studies show that independent PPO (IPPO) can obtain good performance, close to
or even better than the methods of centralized training with decentralized
execution, in several benchmarks. However, decentralized actor-critic with
convergence guarantee is still open. In this paper, we propose
\textit{decentralized policy optimization} (DPO), a decentralized actor-critic
algorithm with monotonic improvement and convergence guarantee. We derive a
novel decentralized surrogate for policy optimization such that the monotonic
improvement of joint policy can be guaranteed by each agent
\textit{independently} optimizing the surrogate. In practice, this
decentralized surrogate can be realized by two adaptive coefficients for policy
optimization at each agent. Empirically, we compare DPO with IPPO in a variety
of cooperative multi-agent tasks, covering discrete and continuous action
spaces, and fully and partially observable environments. The results show DPO
outperforms IPPO in most tasks, which can be the evidence for our theoretical
results.
Related papers
- Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - More Centralized Training, Still Decentralized Execution: Multi-Agent
Conditional Policy Factorization [21.10461189367695]
In cooperative multi-agent reinforcement learning (MARL), combining value decomposition with actor-critic enables agents learn policies.
Agents are commonly assumed to be independent of each other, even in centralized training.
We propose multi-agent conditional policy factorization (MACPF) which takes more centralized training but still enables decentralized execution.
arXiv Detail & Related papers (2022-09-26T13:29:22Z) - Towards Global Optimality in Cooperative MARL with the Transformation
And Distillation Framework [26.612749327414335]
Decentralized execution is one core demand in cooperative multi-agent reinforcement learning (MARL)
In this paper, we theoretically analyze two common classes of algorithms with decentralized policies -- multi-agent policy gradient methods and value-decomposition methods.
We show that TAD-PPO can theoretically perform optimal policy learning in the finite multi-agent MDPs and shows significant outperformance on a large set of cooperative multi-agent tasks.
arXiv Detail & Related papers (2022-07-12T06:59:13Z) - Monotonic Improvement Guarantees under Non-stationarity for
Decentralized PPO [66.5384483339413]
We present a new monotonic improvement guarantee for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL)
We show that a trust region constraint can be effectively enforced in a principled way by bounding independent ratios based on the number of agents in training.
arXiv Detail & Related papers (2022-01-31T20:39:48Z) - Coordinated Proximal Policy Optimization [28.780862892562308]
Coordinated Proximal Policy Optimization (CoPPO) is an algorithm that extends the original Proximal Policy Optimization (PPO) to the multi-agent setting.
We prove the monotonicity of policy improvement when optimizing a theoretically-grounded joint objective.
We then interpret that such an objective in CoPPO can achieve dynamic credit assignment among agents, thereby alleviating the high variance issue during the concurrent update of agent policies.
arXiv Detail & Related papers (2021-11-07T11:14:19Z) - Dealing with Non-Stationarity in Multi-Agent Reinforcement Learning via
Trust Region Decomposition [52.06086375833474]
Non-stationarity is one thorny issue in multi-agent reinforcement learning.
We introduce a $delta$-stationarity measurement to explicitly model the stationarity of a policy sequence.
We propose a trust region decomposition network based on message passing to estimate the joint policy divergence.
arXiv Detail & Related papers (2021-02-21T14:46:50Z) - Is Independent Learning All You Need in the StarCraft Multi-Agent
Challenge? [100.48692829396778]
Independent PPO (IPPO) is a form of independent learning in which each agent simply estimates its local value function.
IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.
arXiv Detail & Related papers (2020-11-18T20:29:59Z) - Multi-Agent Trust Region Policy Optimization [34.91180300856614]
We show that the policy update of TRPO can be transformed into a distributed consensus optimization problem for multi-agent cases.
We propose a decentralized MARL algorithm, which we call multi-agent TRPO (MATRPO)
arXiv Detail & Related papers (2020-10-15T17:49:47Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.