Related papers: SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning

SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2212.07489v2
Date: Tue, 17 Oct 2023 14:05:58 GMT
Title: SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning
Authors: Benjamin Ellis, Jonathan Cook, Skander Moalla, Mikayel Samvelyan, Mingfei Sun, Anuj Mahajan, Jakob N. Foerster and Shimon Whiteson
Abstract summary: The StarCraft Multi-Agent Challenge (SMAC) is a popular testbed for centralised training with decentralised execution. We show that SMAC lacks the partial observability to require complex *closed-loop* policies. We introduce SMACv2, a new version of the benchmark where scenarios are procedurally generated and require agents to generalise to previously unseen settings.
Score: 45.98103968842858
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The availability of challenging benchmarks has played a key role in the recent progress of machine learning. In cooperative multi-agent reinforcement learning, the StarCraft Multi-Agent Challenge (SMAC) has become a popular testbed for centralised training with decentralised execution. However, after years of sustained improvement on SMAC, algorithms now achieve near-perfect performance. In this work, we conduct new analysis demonstrating that SMAC lacks the stochasticity and partial observability to require complex *closed-loop* policies. In particular, we show that an *open-loop* policy conditioned only on the timestep can achieve non-trivial win rates for many SMAC scenarios. To address this limitation, we introduce SMACv2, a new version of the benchmark where scenarios are procedurally generated and require agents to generalise to previously unseen settings (from the same distribution) during evaluation. We also introduce the extended partial observability challenge (EPO), which augments SMACv2 to ensure meaningful partial observability. We show that these changes ensure the benchmark requires the use of *closed-loop* policies. We evaluate state-of-the-art algorithms on SMACv2 and show that it presents significant challenges not present in the original benchmark. Our analysis illustrates that SMACv2 addresses the discovered deficiencies of SMAC and can help benchmark the next generation of MARL methods. Videos of training are available at https://sites.google.com/view/smacv2.

Related papers

SMAC-Hard: Enabling Mixed Opponent Strategy Script and Self-play on SMAC [19.897956357070697]
We present SMAC-HARD, a novel benchmark to enhance training robustness and evaluation comprehensiveness. SMAC-HARD supports customizable opponent strategies, randomization of adversarial policies, and interfaces for MARL self-play. We conduct extensive evaluations of widely used and state-of-the-art algorithms on SMAC-HARD, revealing the substantial challenges posed by edited and mixed strategy opponents.
arXiv Detail & Related papers (2024-12-23T16:36:21Z)
A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Models [8.457552813123597]
StarCraft Multi-Agent Challenge (SMAC) is one of the most commonly used experimental environments in multi-agent reinforcement learning (MARL) Traditional MARL algorithms often require interacting with the environment for up to 1 million steps to train a model. In this paper, we propose a novel approach to solving SMAC tasks called LLM-SMAC.
arXiv Detail & Related papers (2024-10-21T13:58:38Z)
SMAClite: A Lightweight Environment for Multi-Agent Reinforcement Learning [11.292086312664383]
The Starcraft Multi-Agent Challenge (SMAC) has been widely used in MARL research, but is built on top of a heavy, closed-source computer game, StarCraft II. We introduce SMAClite -- a challenge based on SMAC that is both decoupled from Starcraft II and open-source, along with a framework which makes it possible to create new content for SMAClite without any special knowledge. We conduct experiments to show that SMAClite is equivalent to SMAC, by training MARL algorithms on SMAClite and reproducing SMAC results.
arXiv Detail & Related papers (2023-05-09T15:55:19Z)
Extending Compositional Attention Networks for Social Reasoning in Videos [84.12658971655253]
We propose a novel deep architecture for the task of reasoning about social interactions in videos. We leverage the multi-step reasoning capabilities of Compositional Attention Networks (MAC), and propose a multimodal extension (MAC-X)
arXiv Detail & Related papers (2022-10-03T19:03:01Z)
Transformer-based Value Function Decomposition for Cooperative Multi-agent Reinforcement Learning in StarCraft [1.160208922584163]
The StarCraft II Multi-Agent Challenge (SMAC) was created to be a benchmark problem for cooperative multi-agent reinforcement learning (MARL) This paper introduces a new architecture TransMix, a transformer-based joint action-value mixing network.
arXiv Detail & Related papers (2022-08-15T16:13:16Z)
MQBench: Towards Reproducible and Deployable Model Quantization Benchmark [53.12623958951738]
MQBench is a first attempt to evaluate, analyze, and benchmark the and deployability for model quantization algorithms. We choose multiple platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms. We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights.
arXiv Detail & Related papers (2021-11-05T23:38:44Z)
Divergence-Regularized Multi-Agent Actor-Critic [17.995905582226467]
We propose a novel off-policy cooperative MARL framework, divergence-regularized multi-agent actor-critic (DMAC) DMAC is a flexible framework and can be combined with many existing MARL algorithms. We empirically show that DMAC substantially improves the performance of existing MARL algorithms.
arXiv Detail & Related papers (2021-10-01T10:27:42Z)
QTRAN++: Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning [70.382101956278]
QTRAN is a reinforcement learning algorithm capable of learning the largest class of joint-action value functions. Despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments. We propose a substantially improved version, coined QTRAN++.
arXiv Detail & Related papers (2020-06-22T05:08:36Z)
FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC) It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius [133.47492985863136]
Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly. We propose the MACER algorithm, which learns robust models without using adversarial training but performs better than all existing provable l2-defenses. For all tasks, MACER spends less training time than state-of-the-art adversarial training algorithms, and the learned models achieve larger average certified radius.
arXiv Detail & Related papers (2020-01-08T05:08:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.