The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games
- URL: http://arxiv.org/abs/2103.01955v1
- Date: Tue, 2 Mar 2021 18:59:56 GMT
- Title: The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games
- Authors: Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu
- Abstract summary: Multi-Agent PPO (MAPPO) is a multi-agent PPO variant which adopts a centralized value function.
We show that MAPPO achieves performance comparable to the state-of-the-art in three popular multi-agent testbeds.
- Score: 67.47961797770249
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proximal Policy Optimization (PPO) is a popular on-policy reinforcement
learning algorithm but is significantly less utilized than off-policy learning
algorithms in multi-agent problems. In this work, we investigate Multi-Agent
PPO (MAPPO), a multi-agent PPO variant which adopts a centralized value
function. Using a 1-GPU desktop, we show that MAPPO achieves performance
comparable to the state-of-the-art in three popular multi-agent testbeds: the
Particle World environments, Starcraft II Micromanagement Tasks, and the Hanabi
Challenge, with minimal hyperparameter tuning and without any domain-specific
algorithmic modifications or architectures. In the majority of environments, we
find that compared to off-policy baselines, MAPPO achieves better or comparable
sample complexity as well as substantially faster running time. Finally, we
present 5 factors most influential to MAPPO's practical performance with
ablation studies.
Related papers
- Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy Optimization [22.148299126441966]
We propose a multi-agent reinforcement learning algorithm that adapts recent developments in credit assignment to improve upon MAPPO.
Our approach, PRD-MAPPO, decouples agents from teammates that do not influence their expected future reward, thereby streamlining credit assignment.
We show that PRD-MAPPO yields significantly higher data efficiency and performance compared to both MAPPO and other state-of-the-art methods.
arXiv Detail & Related papers (2024-08-08T08:18:05Z) - Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding [18.06081009550052]
Multi-Agent Reinforcement Learning (MARL) based Multi-Agent Path Finding (MAPF) has recently gained attention due to its efficiency and scalability.
Several MARL-MAPF methods choose to use communication to enrich the information one agent can perceive.
We propose a new method, Ensembling Prioritized Hybrid Policies (EPH)
arXiv Detail & Related papers (2024-03-12T11:47:12Z) - Surpassing legacy approaches to PWR core reload optimization with single-objective Reinforcement learning [0.0]
We have developed methods based on Deep Reinforcement Learning (DRL) for both single- and multi-objective optimization.
In this paper, we demonstrate the advantage of our RL-based approach, specifically using Proximal Policy Optimization (PPO)
PPO adapts its search capability via a policy with learnable weights, allowing it to function as both a global and local search method.
arXiv Detail & Related papers (2024-02-16T19:35:58Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Semi-On-Policy Training for Sample Efficient Multi-Agent Policy
Gradients [51.749831824106046]
We introduce semi-on-policy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods.
We show that our methods perform as well or better than state-of-the-art value-based methods on a variety of SMAC tasks.
arXiv Detail & Related papers (2021-04-27T19:37:01Z) - Is Independent Learning All You Need in the StarCraft Multi-Agent
Challenge? [100.48692829396778]
Independent PPO (IPPO) is a form of independent learning in which each agent simply estimates its local value function.
IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.
arXiv Detail & Related papers (2020-11-18T20:29:59Z) - Off-Policy Multi-Agent Decomposed Policy Gradients [30.389041305278045]
We investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP)
DOP supports efficient off-policy learning and addresses the issue of centralized-decentralized mismatch and credit assignment.
In addition, empirical evaluations on the StarCraft II micromanagement benchmark and multi-agent particle environments demonstrate that DOP significantly outperforms both state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms.
arXiv Detail & Related papers (2020-07-24T02:21:55Z) - Implementation Matters in Deep Policy Gradients: A Case Study on PPO and
TRPO [90.90009491366273]
We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms.
Specifically, we investigate the consequences of "code-level optimizations:"
Our results show that they (a) are responsible for most of PPO's gain in cumulative reward over TRPO, and (b) fundamentally change how RL methods function.
arXiv Detail & Related papers (2020-05-25T16:24:59Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.