Related papers: Transformer-based Value Function Decomposition for Cooperative Multi-agent Reinforcement Learning in StarCraft

Transformer-based Value Function Decomposition for Cooperative Multi-agent Reinforcement Learning in StarCraft

URL: http://arxiv.org/abs/2208.07298v1
Date: Mon, 15 Aug 2022 16:13:16 GMT
Title: Transformer-based Value Function Decomposition for Cooperative Multi-agent Reinforcement Learning in StarCraft
Authors: Muhammad Junaid Khan, Syed Hammad Ahmed, Gita Sukthankar
Abstract summary: The StarCraft II Multi-Agent Challenge (SMAC) was created to be a benchmark problem for cooperative multi-agent reinforcement learning (MARL) This paper introduces a new architecture TransMix, a transformer-based joint action-value mixing network.
Score: 1.160208922584163
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The StarCraft II Multi-Agent Challenge (SMAC) was created to be a challenging benchmark problem for cooperative multi-agent reinforcement learning (MARL). SMAC focuses exclusively on the problem of StarCraft micromanagement and assumes that each unit is controlled individually by a learning agent that acts independently and only possesses local information; centralized training is assumed to occur with decentralized execution (CTDE). To perform well in SMAC, MARL algorithms must handle the dual problems of multi-agent credit assignment and joint action evaluation. This paper introduces a new architecture TransMix, a transformer-based joint action-value mixing network which we show to be efficient and scalable as compared to the other state-of-the-art cooperative MARL solutions. TransMix leverages the ability of transformers to learn a richer mixing function for combining the agents' individual value functions. It achieves comparable performance to previous work on easy SMAC scenarios and outperforms other techniques on hard scenarios, as well as scenarios that are corrupted with Gaussian noise to simulate fog of war.

Related papers

Learning Multi-Robot Coordination through Locality-Based Factorized Multi-Agent Actor-Critic Algorithm [54.98788921815576]
We present a novel cooperative multi-agent reinforcement learning method called textbfLocality based textbfFactorized textbfMulti-Agent textbfActor-textbfCritic (Loc-FACMAC) We integrate the concept of locality into critic learning, where strongly related robots form partitions during training. Our method improves existing algorithms by focusing on local rewards and leveraging partition-based learning to enhance training efficiency and performance.
arXiv Detail & Related papers (2025-03-24T16:00:16Z)
Cooperative and Asynchronous Transformer-based Mission Planning for Heterogeneous Teams of Mobile Robots [1.1049608786515839]
This paper presents the Cooperative and Asynchronous Transformer-based Mission Planning (CATMiP) framework. CatMiP uses multi-agent reinforcement learning to coordinate agents with heterogeneous sensing, motion, and actuation capabilities. It easily adapts to mission complexities and communication constraints, and scales to varying environment sizes and team compositions.
arXiv Detail & Related papers (2024-10-08T21:14:09Z)
MAIDCRL: Semi-centralized Multi-Agent Influence Dense-CNN Reinforcement Learning [0.7366405857677227]
We present a semi-centralized Dense Reinforcement Learning algorithm enhanced by agent influence maps (AIMs) for learning effective multi-agent control on StarCraft Multi-Agent Challenge (SMAC) scenarios. The results show that the CNN-enabled MAIDCRL significantly improved the learning performance and achieved a faster learning rate compared to the existing MAIDRL.
arXiv Detail & Related papers (2024-02-12T18:53:20Z)
MaskMA: Towards Zero-Shot Multi-Agent Decision Making with Mask-Based Collaborative Learning [56.00558959816801]
We propose a Mask-Based collaborative learning framework for Multi-Agent decision making (MaskMA) We show MaskMA can achieve an impressive 77.8% average zero-shot win rate on 60 unseen test maps by decentralized execution.
arXiv Detail & Related papers (2023-10-18T09:53:27Z)
Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation Learning [13.060023718506917]
imitation learning (IL) is a problem of learning to mimic expert behaviors from demonstrations in cooperative multi-agent systems. We introduce a novel multi-agent IL algorithm designed to address these challenges. Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions.
arXiv Detail & Related papers (2023-10-10T17:11:20Z)
MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for Cooperative Multi-Agent Reinforcement Learning [15.972363414919279]
MMD-mix is a method that combines distributional reinforcement learning and value decomposition. The experiments demonstrate that MMD-mix outperforms prior baselines in the Star Multi-Agent Challenge (SMAC) environment.
arXiv Detail & Related papers (2021-06-22T10:21:00Z)
Softmax with Regularization: Better Value Estimation in Multi-Agent Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning. We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline. We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z)
UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn) UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features. Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z)
F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications. We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting. Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC) It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.