SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement
Learning
- URL: http://arxiv.org/abs/2212.07489v2
- Date: Tue, 17 Oct 2023 14:05:58 GMT
- Title: SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement
Learning
- Authors: Benjamin Ellis, Jonathan Cook, Skander Moalla, Mikayel Samvelyan,
Mingfei Sun, Anuj Mahajan, Jakob N. Foerster and Shimon Whiteson
- Abstract summary: The StarCraft Multi-Agent Challenge (SMAC) is a popular testbed for centralised training with decentralised execution.
We show that SMAC lacks the partial observability to require complex *closed-loop* policies.
We introduce SMACv2, a new version of the benchmark where scenarios are procedurally generated and require agents to generalise to previously unseen settings.
- Score: 45.98103968842858
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The availability of challenging benchmarks has played a key role in the
recent progress of machine learning. In cooperative multi-agent reinforcement
learning, the StarCraft Multi-Agent Challenge (SMAC) has become a popular
testbed for centralised training with decentralised execution. However, after
years of sustained improvement on SMAC, algorithms now achieve near-perfect
performance. In this work, we conduct new analysis demonstrating that SMAC
lacks the stochasticity and partial observability to require complex
*closed-loop* policies. In particular, we show that an *open-loop* policy
conditioned only on the timestep can achieve non-trivial win rates for many
SMAC scenarios. To address this limitation, we introduce SMACv2, a new version
of the benchmark where scenarios are procedurally generated and require agents
to generalise to previously unseen settings (from the same distribution) during
evaluation. We also introduce the extended partial observability challenge
(EPO), which augments SMACv2 to ensure meaningful partial observability. We
show that these changes ensure the benchmark requires the use of *closed-loop*
policies. We evaluate state-of-the-art algorithms on SMACv2 and show that it
presents significant challenges not present in the original benchmark. Our
analysis illustrates that SMACv2 addresses the discovered deficiencies of SMAC
and can help benchmark the next generation of MARL methods. Videos of training
are available at https://sites.google.com/view/smacv2.
Related papers
- A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Models [8.457552813123597]
StarCraft Multi-Agent Challenge (SMAC) is one of the most commonly used experimental environments in multi-agent reinforcement learning (MARL)
Traditional MARL algorithms often require interacting with the environment for up to 1 million steps to train a model.
In this paper, we propose a novel approach to solving SMAC tasks called LLM-SMAC.
arXiv Detail & Related papers (2024-10-21T13:58:38Z) - SMAClite: A Lightweight Environment for Multi-Agent Reinforcement
Learning [11.292086312664383]
The Starcraft Multi-Agent Challenge (SMAC) has been widely used in MARL research, but is built on top of a heavy, closed-source computer game, StarCraft II.
We introduce SMAClite -- a challenge based on SMAC that is both decoupled from Starcraft II and open-source, along with a framework which makes it possible to create new content for SMAClite without any special knowledge.
We conduct experiments to show that SMAClite is equivalent to SMAC, by training MARL algorithms on SMAClite and reproducing SMAC results.
arXiv Detail & Related papers (2023-05-09T15:55:19Z) - Extending Compositional Attention Networks for Social Reasoning in
Videos [84.12658971655253]
We propose a novel deep architecture for the task of reasoning about social interactions in videos.
We leverage the multi-step reasoning capabilities of Compositional Attention Networks (MAC), and propose a multimodal extension (MAC-X)
arXiv Detail & Related papers (2022-10-03T19:03:01Z) - Transformer-based Value Function Decomposition for Cooperative
Multi-agent Reinforcement Learning in StarCraft [1.160208922584163]
The StarCraft II Multi-Agent Challenge (SMAC) was created to be a benchmark problem for cooperative multi-agent reinforcement learning (MARL)
This paper introduces a new architecture TransMix, a transformer-based joint action-value mixing network.
arXiv Detail & Related papers (2022-08-15T16:13:16Z) - MQBench: Towards Reproducible and Deployable Model Quantization
Benchmark [53.12623958951738]
MQBench is a first attempt to evaluate, analyze, and benchmark the and deployability for model quantization algorithms.
We choose multiple platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms.
We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights.
arXiv Detail & Related papers (2021-11-05T23:38:44Z) - Divergence-Regularized Multi-Agent Actor-Critic [17.995905582226467]
We propose a novel off-policy cooperative MARL framework, divergence-regularized multi-agent actor-critic (DMAC)
DMAC is a flexible framework and can be combined with many existing MARL algorithms.
We empirically show that DMAC substantially improves the performance of existing MARL algorithms.
arXiv Detail & Related papers (2021-10-01T10:27:42Z) - QTRAN++: Improved Value Transformation for Cooperative Multi-Agent
Reinforcement Learning [70.382101956278]
QTRAN is a reinforcement learning algorithm capable of learning the largest class of joint-action value functions.
Despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments.
We propose a substantially improved version, coined QTRAN++.
arXiv Detail & Related papers (2020-06-22T05:08:36Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z) - MACER: Attack-free and Scalable Robust Training via Maximizing Certified
Radius [133.47492985863136]
Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly.
We propose the MACER algorithm, which learns robust models without using adversarial training but performs better than all existing provable l2-defenses.
For all tasks, MACER spends less training time than state-of-the-art adversarial training algorithms, and the learned models achieve larger average certified radius.
arXiv Detail & Related papers (2020-01-08T05:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.