SMAC-Hard: Enabling Mixed Opponent Strategy Script and Self-play on SMAC
- URL: http://arxiv.org/abs/2412.17707v2
- Date: Tue, 24 Dec 2024 16:16:34 GMT
- Title: SMAC-Hard: Enabling Mixed Opponent Strategy Script and Self-play on SMAC
- Authors: Yue Deng, Yan Yu, Weiyu Ma, Zirui Wang, Wenhui Zhu, Jian Zhao, Yin Zhang,
- Abstract summary: We present SMAC-HARD, a novel benchmark to enhance training robustness and evaluation comprehensiveness.
SMAC-HARD supports customizable opponent strategies, randomization of adversarial policies, and interfaces for MARL self-play.
We conduct extensive evaluations of widely used and state-of-the-art algorithms on SMAC-HARD, revealing the substantial challenges posed by edited and mixed strategy opponents.
- Score: 19.897956357070697
- License:
- Abstract: The availability of challenging simulation environments is pivotal for advancing the field of Multi-Agent Reinforcement Learning (MARL). In cooperative MARL settings, the StarCraft Multi-Agent Challenge (SMAC) has gained prominence as a benchmark for algorithms following centralized training with decentralized execution paradigm. However, with continual advancements in SMAC, many algorithms now exhibit near-optimal performance, complicating the evaluation of their true effectiveness. To alleviate this problem, in this work, we highlight a critical issue: the default opponent policy in these environments lacks sufficient diversity, leading MARL algorithms to overfit and exploit unintended vulnerabilities rather than learning robust strategies. To overcome these limitations, we propose SMAC-HARD, a novel benchmark designed to enhance training robustness and evaluation comprehensiveness. SMAC-HARD supports customizable opponent strategies, randomization of adversarial policies, and interfaces for MARL self-play, enabling agents to generalize to varying opponent behaviors and improve model stability. Furthermore, we introduce a black-box testing framework wherein agents are trained without exposure to the edited opponent scripts but are tested against these scripts to evaluate the policy coverage and adaptability of MARL algorithms. We conduct extensive evaluations of widely used and state-of-the-art algorithms on SMAC-HARD, revealing the substantial challenges posed by edited and mixed strategy opponents. Additionally, the black-box strategy tests illustrate the difficulty of transferring learned policies to unseen adversaries. We envision SMAC-HARD as a critical step toward benchmarking the next generation of MARL algorithms, fostering progress in self-play methods for multi-agent systems. Our code is available at https://github.com/devindeng94/smac-hard.
Related papers
- A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Models [8.457552813123597]
StarCraft Multi-Agent Challenge (SMAC) is one of the most commonly used experimental environments in multi-agent reinforcement learning (MARL)
Traditional MARL algorithms often require interacting with the environment for up to 1 million steps to train a model.
In this paper, we propose a novel approach to solving SMAC tasks called LLM-SMAC.
arXiv Detail & Related papers (2024-10-21T13:58:38Z) - Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - Robust Multi-Agent Reinforcement Learning via Adversarial
Regularization: Theoretical Foundation and Stable Algorithms [79.61176746380718]
Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains.
MARL policies often lack robustness and are sensitive to small changes in their environment.
We show that we can gain robustness by controlling a policy's Lipschitz constant.
We propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies.
arXiv Detail & Related papers (2023-10-16T20:14:06Z) - SMAClite: A Lightweight Environment for Multi-Agent Reinforcement
Learning [11.292086312664383]
The Starcraft Multi-Agent Challenge (SMAC) has been widely used in MARL research, but is built on top of a heavy, closed-source computer game, StarCraft II.
We introduce SMAClite -- a challenge based on SMAC that is both decoupled from Starcraft II and open-source, along with a framework which makes it possible to create new content for SMAClite without any special knowledge.
We conduct experiments to show that SMAClite is equivalent to SMAC, by training MARL algorithms on SMAClite and reproducing SMAC results.
arXiv Detail & Related papers (2023-05-09T15:55:19Z) - SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement
Learning [45.98103968842858]
The StarCraft Multi-Agent Challenge (SMAC) is a popular testbed for centralised training with decentralised execution.
We show that SMAC lacks the partial observability to require complex *closed-loop* policies.
We introduce SMACv2, a new version of the benchmark where scenarios are procedurally generated and require agents to generalise to previously unseen settings.
arXiv Detail & Related papers (2022-12-14T20:15:19Z) - Towards Comprehensive Testing on the Robustness of Cooperative
Multi-agent Reinforcement Learning [10.132303690998523]
It is crucial to test the robustness of c-MARL algorithm before it was deployed in reality.
Existing adversarial attacks for MARL could be used for testing, but is limited to one robustness aspect.
We propose MARLSafe, the first robustness testing framework for c-MARL algorithms.
arXiv Detail & Related papers (2022-04-17T05:15:51Z) - Divergence-Regularized Multi-Agent Actor-Critic [17.995905582226467]
We propose a novel off-policy cooperative MARL framework, divergence-regularized multi-agent actor-critic (DMAC)
DMAC is a flexible framework and can be combined with many existing MARL algorithms.
We empirically show that DMAC substantially improves the performance of existing MARL algorithms.
arXiv Detail & Related papers (2021-10-01T10:27:42Z) - Semi-On-Policy Training for Sample Efficient Multi-Agent Policy
Gradients [51.749831824106046]
We introduce semi-on-policy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods.
We show that our methods perform as well or better than state-of-the-art value-based methods on a variety of SMAC tasks.
arXiv Detail & Related papers (2021-04-27T19:37:01Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.