MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement
Learning
- URL: http://arxiv.org/abs/2303.03376v1
- Date: Mon, 6 Mar 2023 18:57:41 GMT
- Title: MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement
Learning
- Authors: Mikayel Samvelyan, Akbir Khan, Michael Dennis, Minqi Jiang, Jack
Parker-Holder, Jakob Foerster, Roberta Raileanu, Tim Rockt\"aschel
- Abstract summary: We introduce Multi-Agent Environment Design Strategist for Open-Ended Learning (MAESTRO)
MAESTRO is the first multi-agent UED approach for two-player zero-sum settings.
Our experiments show that MAESTRO outperforms a number of strong baselines on competitive two-player games.
- Score: 22.28076947612619
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open-ended learning methods that automatically generate a curriculum of
increasingly challenging tasks serve as a promising avenue toward generally
capable reinforcement learning agents. Existing methods adapt curricula
independently over either environment parameters (in single-agent settings) or
co-player policies (in multi-agent settings). However, the strengths and
weaknesses of co-players can manifest themselves differently depending on
environmental features. It is thus crucial to consider the dependency between
the environment and co-player when shaping a curriculum in multi-agent domains.
In this work, we use this insight and extend Unsupervised Environment Design
(UED) to multi-agent environments. We then introduce Multi-Agent Environment
Design Strategist for Open-Ended Learning (MAESTRO), the first multi-agent UED
approach for two-player zero-sum settings. MAESTRO efficiently produces
adversarial, joint curricula over both environments and co-players and attains
minimax-regret guarantees at Nash equilibrium. Our experiments show that
MAESTRO outperforms a number of strong baselines on competitive two-player
games, spanning discrete and continuous control settings.
Related papers
- Multi-Agent Diagnostics for Robustness via Illuminated Diversity [37.38316542660311]
We present Multi-Agent Diagnostics for Robustness via Illuminated Diversity (MADRID)
MADRID generates diverse adversarial scenarios that expose strategic vulnerabilities in pre-trained multi-agent policies.
We evaluate the effectiveness of MADRID on the 11vs11 version of Google Research Football.
arXiv Detail & Related papers (2024-01-24T14:02:09Z) - Leading the Pack: N-player Opponent Shaping [52.682734939786464]
We extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents.
We find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well.
arXiv Detail & Related papers (2023-12-19T20:01:42Z) - Stackelberg Games for Learning Emergent Behaviors During Competitive
Autocurricula [35.88217121803472]
This paper proposes a novel game-theoretic algorithm, Stackelberg Multi-Agent Deep Deterministic Policy Gradient (ST-MADDPG)
It formulates a two-player MARL problem as a Stackelberg game with one player as the leader' and the other as the follower' in a hierarchical interaction structure wherein the leader has an advantage.
By exploiting the leader's advantage, ST-MADDPG improves the quality of a co-evolution process and results in more sophisticated and complex strategies that work well even against an unseen strong opponent.
arXiv Detail & Related papers (2023-05-04T19:27:35Z) - Multi-Agent Interplay in a Competitive Survival Environment [0.0]
This thesis is part of the author's thesis "Multi-Agent Interplay in a Competitive Survival Environment" for the Master's Degree in Artificial Intelligence and Robotics at Sapienza University of Rome, 2022.
arXiv Detail & Related papers (2023-01-19T12:04:03Z) - Decentralized Cooperative Multi-Agent Reinforcement Learning with
Exploration [35.75029940279768]
We study multi-agent reinforcement learning in the most basic cooperative setting -- Markov teams.
We propose an algorithm in which each agent independently runs a stage-based V-learning style algorithm.
We show that the agents can learn an $epsilon$-approximate Nash equilibrium policy in at most $proptowidetildeO (1/epsilon4)$ episodes.
arXiv Detail & Related papers (2021-10-12T02:45:12Z) - Semantic Tracklets: An Object-Centric Representation for Visual
Multi-Agent Reinforcement Learning [126.57680291438128]
We study whether scalability can be achieved via a disentangled representation.
We evaluate semantic tracklets' on the visual multi-agent particle environment (VMPE) and on the challenging visual multi-agent GFootball environment.
Notably, this method is the first to successfully learn a strategy for five players in the GFootball environment using only visual data.
arXiv Detail & Related papers (2021-08-06T22:19:09Z) - Emergent Complexity and Zero-shot Transfer via Unsupervised Environment
Design [121.73425076217471]
We propose Unsupervised Environment Design (UED), where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments.
We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED)
Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
arXiv Detail & Related papers (2020-12-03T17:37:01Z) - Is Independent Learning All You Need in the StarCraft Multi-Agent
Challenge? [100.48692829396778]
Independent PPO (IPPO) is a form of independent learning in which each agent simply estimates its local value function.
IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.
arXiv Detail & Related papers (2020-11-18T20:29:59Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.