Transformer-based Value Function Decomposition for Cooperative
Multi-agent Reinforcement Learning in StarCraft
- URL: http://arxiv.org/abs/2208.07298v1
- Date: Mon, 15 Aug 2022 16:13:16 GMT
- Title: Transformer-based Value Function Decomposition for Cooperative
Multi-agent Reinforcement Learning in StarCraft
- Authors: Muhammad Junaid Khan, Syed Hammad Ahmed, Gita Sukthankar
- Abstract summary: The StarCraft II Multi-Agent Challenge (SMAC) was created to be a benchmark problem for cooperative multi-agent reinforcement learning (MARL)
This paper introduces a new architecture TransMix, a transformer-based joint action-value mixing network.
- Score: 1.160208922584163
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The StarCraft II Multi-Agent Challenge (SMAC) was created to be a challenging
benchmark problem for cooperative multi-agent reinforcement learning (MARL).
SMAC focuses exclusively on the problem of StarCraft micromanagement and
assumes that each unit is controlled individually by a learning agent that acts
independently and only possesses local information; centralized training is
assumed to occur with decentralized execution (CTDE). To perform well in SMAC,
MARL algorithms must handle the dual problems of multi-agent credit assignment
and joint action evaluation.
This paper introduces a new architecture TransMix, a transformer-based joint
action-value mixing network which we show to be efficient and scalable as
compared to the other state-of-the-art cooperative MARL solutions. TransMix
leverages the ability of transformers to learn a richer mixing function for
combining the agents' individual value functions. It achieves comparable
performance to previous work on easy SMAC scenarios and outperforms other
techniques on hard scenarios, as well as scenarios that are corrupted with
Gaussian noise to simulate fog of war.
Related papers
- Cooperative and Asynchronous Transformer-based Mission Planning for Heterogeneous Teams of Mobile Robots [1.1049608786515839]
This paper presents the Cooperative and Asynchronous Transformer-based Mission Planning (CATMiP) framework.
CatMiP uses multi-agent reinforcement learning to coordinate agents with heterogeneous sensing, motion, and actuation capabilities.
It easily adapts to mission complexities and communication constraints, and scales to varying environment sizes and team compositions.
arXiv Detail & Related papers (2024-10-08T21:14:09Z) - MAIDCRL: Semi-centralized Multi-Agent Influence Dense-CNN Reinforcement
Learning [0.7366405857677227]
We present a semi-centralized Dense Reinforcement Learning algorithm enhanced by agent influence maps (AIMs) for learning effective multi-agent control on StarCraft Multi-Agent Challenge (SMAC) scenarios.
The results show that the CNN-enabled MAIDCRL significantly improved the learning performance and achieved a faster learning rate compared to the existing MAIDRL.
arXiv Detail & Related papers (2024-02-12T18:53:20Z) - MaskMA: Towards Zero-Shot Multi-Agent Decision Making with Mask-Based
Collaborative Learning [56.00558959816801]
We propose a Mask-Based collaborative learning framework for Multi-Agent decision making (MaskMA)
We show MaskMA can achieve an impressive 77.8% average zero-shot win rate on 60 unseen test maps by decentralized execution.
arXiv Detail & Related papers (2023-10-18T09:53:27Z) - Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation
Learning [13.060023718506917]
imitation learning (IL) is a problem of learning to mimic expert behaviors from demonstrations in cooperative multi-agent systems.
We introduce a novel multi-agent IL algorithm designed to address these challenges.
Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions.
arXiv Detail & Related papers (2023-10-10T17:11:20Z) - MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for
Cooperative Multi-Agent Reinforcement Learning [15.972363414919279]
MMD-mix is a method that combines distributional reinforcement learning and value decomposition.
The experiments demonstrate that MMD-mix outperforms prior baselines in the Star Multi-Agent Challenge (SMAC) environment.
arXiv Detail & Related papers (2021-06-22T10:21:00Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.