Related papers: The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

URL: http://arxiv.org/abs/2103.01955v1
Date: Tue, 2 Mar 2021 18:59:56 GMT
Title: The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games
Authors: Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu
Abstract summary: Multi-Agent PPO (MAPPO) is a multi-agent PPO variant which adopts a centralized value function. We show that MAPPO achieves performance comparable to the state-of-the-art in three popular multi-agent testbeds.
Score: 67.47961797770249
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent problems. In this work, we investigate Multi-Agent PPO (MAPPO), a multi-agent PPO variant which adopts a centralized value function. Using a 1-GPU desktop, we show that MAPPO achieves performance comparable to the state-of-the-art in three popular multi-agent testbeds: the Particle World environments, Starcraft II Micromanagement Tasks, and the Hanabi Challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. In the majority of environments, we find that compared to off-policy baselines, MAPPO achieves better or comparable sample complexity as well as substantially faster running time. Finally, we present 5 factors most influential to MAPPO's practical performance with ablation studies.

Related papers

Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy Optimization [22.148299126441966]
We propose a multi-agent reinforcement learning algorithm that adapts recent developments in credit assignment to improve upon MAPPO. Our approach, PRD-MAPPO, decouples agents from teammates that do not influence their expected future reward, thereby streamlining credit assignment. We show that PRD-MAPPO yields significantly higher data efficiency and performance compared to both MAPPO and other state-of-the-art methods.
arXiv Detail & Related papers (2024-08-08T08:18:05Z)
Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding [18.06081009550052]
Multi-Agent Reinforcement Learning (MARL) based Multi-Agent Path Finding (MAPF) has recently gained attention due to its efficiency and scalability. Several MARL-MAPF methods choose to use communication to enrich the information one agent can perceive. We propose a new method, Ensembling Prioritized Hybrid Policies (EPH)
arXiv Detail & Related papers (2024-03-12T11:47:12Z)
Surpassing legacy approaches to PWR core reload optimization with single-objective Reinforcement learning [0.0]
We have developed methods based on Deep Reinforcement Learning (DRL) for both single- and multi-objective optimization. In this paper, we demonstrate the advantage of our RL-based approach, specifically using Proximal Policy Optimization (PPO) PPO adapts its search capability via a policy with learnable weights, allowing it to function as both a global and local search method.
arXiv Detail & Related papers (2024-02-16T19:35:58Z)
Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO. We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z)
Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients [51.749831824106046]
We introduce semi-on-policy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods. We show that our methods perform as well or better than state-of-the-art value-based methods on a variety of SMAC tasks.
arXiv Detail & Related papers (2021-04-27T19:37:01Z)
Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge? [100.48692829396778]
Independent PPO (IPPO) is a form of independent learning in which each agent simply estimates its local value function. IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.
arXiv Detail & Related papers (2020-11-18T20:29:59Z)
Off-Policy Multi-Agent Decomposed Policy Gradients [30.389041305278045]
We investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP) DOP supports efficient off-policy learning and addresses the issue of centralized-decentralized mismatch and credit assignment. In addition, empirical evaluations on the StarCraft II micromanagement benchmark and multi-agent particle environments demonstrate that DOP significantly outperforms both state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms.
arXiv Detail & Related papers (2020-07-24T02:21:55Z)
Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO [90.90009491366273]
We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms. Specifically, we investigate the consequences of "code-level optimizations:" Our results show that they (a) are responsible for most of PPO's gain in cumulative reward over TRPO, and (b) fundamentally change how RL methods function.
arXiv Detail & Related papers (2020-05-25T16:24:59Z)
FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC) It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.