Off-Policy Multi-Agent Decomposed Policy Gradients
- URL: http://arxiv.org/abs/2007.12322v2
- Date: Sun, 4 Oct 2020 08:07:44 GMT
- Title: Off-Policy Multi-Agent Decomposed Policy Gradients
- Authors: Yihan Wang, Beining Han, Tonghan Wang, Heng Dong, Chongjie Zhang
- Abstract summary: We investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP)
DOP supports efficient off-policy learning and addresses the issue of centralized-decentralized mismatch and credit assignment.
In addition, empirical evaluations on the StarCraft II micromanagement benchmark and multi-agent particle environments demonstrate that DOP significantly outperforms both state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms.
- Score: 30.389041305278045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent policy gradient (MAPG) methods recently witness vigorous
progress. However, there is a significant performance discrepancy between MAPG
methods and state-of-the-art multi-agent value-based approaches. In this paper,
we investigate causes that hinder the performance of MAPG algorithms and
present a multi-agent decomposed policy gradient method (DOP). This method
introduces the idea of value function decomposition into the multi-agent
actor-critic framework. Based on this idea, DOP supports efficient off-policy
learning and addresses the issue of centralized-decentralized mismatch and
credit assignment in both discrete and continuous action spaces. We formally
show that DOP critics have sufficient representational capability to guarantee
convergence. In addition, empirical evaluations on the StarCraft II
micromanagement benchmark and multi-agent particle environments demonstrate
that DOP significantly outperforms both state-of-the-art value-based and
policy-based multi-agent reinforcement learning algorithms. Demonstrative
videos are available at https://sites.google.com/view/dop-mapg/.
Related papers
- TAPE: Leveraging Agent Topology for Cooperative Multi-Agent Policy
Gradient [36.83464785085713]
We propose an agent topology framework, which decides whether other agents should be considered in policy.
Agents can use coalition utility as learning objective instead of global utility by centralized critics or local utility by individual critics.
We prove the policy improvement theorem for TAPE and give a theoretical explanation for the improved cooperation among agents.
arXiv Detail & Related papers (2023-12-25T09:24:33Z) - Optimistic Multi-Agent Policy Gradient [23.781837938235036]
Relative overgeneralization (RO) occurs when agents converge towards a suboptimal joint policy.
No methods have been proposed for addressing RO in multi-agent policy gradient (MAPG) methods.
We propose a general, yet simple, framework to enable optimistic updates in MAPG methods that alleviate the RO problem.
arXiv Detail & Related papers (2023-11-03T14:47:54Z) - DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm [48.60180355291149]
We introduce doubly multi-step off-policy VI (DoMo-VI), a novel oracle algorithm that combines multi-step policy improvements and policy evaluations.
We then propose doubly multi-step off-policy actor-critic (DoMo-AC), a practical instantiation of the DoMo-VI algorithm.
arXiv Detail & Related papers (2023-05-29T14:36:51Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Semi-On-Policy Training for Sample Efficient Multi-Agent Policy
Gradients [51.749831824106046]
We introduce semi-on-policy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods.
We show that our methods perform as well or better than state-of-the-art value-based methods on a variety of SMAC tasks.
arXiv Detail & Related papers (2021-04-27T19:37:01Z) - Decomposed Soft Actor-Critic Method for Cooperative Multi-Agent
Reinforcement Learning [10.64928897082273]
Experimental results demonstrate that mSAC significantly outperforms policy-based approach COMA.
In addition, mSAC achieves pretty good results on large action space tasks, such as 2c_vs_64zg and MMM2.
arXiv Detail & Related papers (2021-04-14T07:02:40Z) - The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games [67.47961797770249]
Multi-Agent PPO (MAPPO) is a multi-agent PPO variant which adopts a centralized value function.
We show that MAPPO achieves performance comparable to the state-of-the-art in three popular multi-agent testbeds.
arXiv Detail & Related papers (2021-03-02T18:59:56Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.