Value Function Factorisation with Hypergraph Convolution for Cooperative
Multi-agent Reinforcement Learning
- URL: http://arxiv.org/abs/2112.06771v1
- Date: Thu, 9 Dec 2021 08:40:38 GMT
- Title: Value Function Factorisation with Hypergraph Convolution for Cooperative
Multi-agent Reinforcement Learning
- Authors: Yunpeng Bai, Chen Gong, Bin Zhang, Guoliang Fan, Xinwen Hou
- Abstract summary: We propose a method that combines hypergraph convolution with value decomposition.
By treating action values as signals, HGCN-Mix aims to explore the relationship between these signals via a self-learning hypergraph.
Experimental results present that HGCN-Mix matches or surpasses state-of-the-art techniques in the StarCraft II multi-agent challenge (SMAC) benchmark.
- Score: 32.768661516953344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cooperation between agents in a multi-agent system (MAS) has become a hot
topic in recent years, and many algorithms based on centralized training with
decentralized execution (CTDE), such as VDN and QMIX, have been proposed.
However, these methods disregard the information hidden in the individual
action values. In this paper, we propose HyperGraph CoNvolution MIX (HGCN-MIX),
a method that combines hypergraph convolution with value decomposition. By
treating action values as signals, HGCN-MIX aims to explore the relationship
between these signals via a self-learning hypergraph. Experimental results
present that HGCN-MIX matches or surpasses state-of-the-art techniques in the
StarCraft II multi-agent challenge (SMAC) benchmark on various situations,
notably those with a number of agents.
Related papers
- QTypeMix: Enhancing Multi-Agent Cooperative Strategies through Heterogeneous and Homogeneous Value Decomposition [11.170571181947274]
We propose QTypeMix, which divides the value decomposition process into homogeneous and heterogeneous stages.
The results of testing the proposed method on 14 maps from SMAC and SMACv2 show that QTypeMix achieves state-of-the-art performance in tasks of varying difficulty.
arXiv Detail & Related papers (2024-08-12T12:27:58Z) - Mastering Complex Coordination through Attention-based Dynamic Graph [14.855793715829954]
We present DAGmix, a novel graph-based value factorization method.
Instead of a complete graph, DAGmix generates a dynamic graph at each time step during training.
Experiments show that DAGmix significantly outperforms previous SOTA methods in large-scale scenarios.
arXiv Detail & Related papers (2023-12-07T12:02:14Z) - On the Equivalence of Graph Convolution and Mixup [70.0121263465133]
This paper investigates the relationship between graph convolution and Mixup techniques.
Under two mild conditions, graph convolution can be viewed as a specialized form of Mixup.
We establish this equivalence mathematically by demonstrating that graph convolution networks (GCN) and simplified graph convolution (SGC) can be expressed as a form of Mixup.
arXiv Detail & Related papers (2023-09-29T23:09:54Z) - Efficient Cooperation Strategy Generation in Multi-Agent Video Games via
Hypergraph Neural Network [16.226702761758595]
The performance of deep reinforcement learning in single-agent video games is astounding.
However, researchers have extra difficulties while working with video games in multi-agent environments.
We propose a novel algorithm based on the actor-critic method, which adapts the hypergraph structure of agents and employs hypergraph convolution to complete information feature extraction and representation between agents.
arXiv Detail & Related papers (2022-03-07T10:34:40Z) - Gated recurrent units and temporal convolutional network for multilabel
classification [122.84638446560663]
This work proposes a new ensemble method for managing multilabel classification.
The core of the proposed approach combines a set of gated recurrent units and temporal convolutional neural networks trained with variants of the Adam gradients optimization approach.
arXiv Detail & Related papers (2021-10-09T00:00:16Z) - MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for
Cooperative Multi-Agent Reinforcement Learning [15.972363414919279]
MMD-mix is a method that combines distributional reinforcement learning and value decomposition.
The experiments demonstrate that MMD-mix outperforms prior baselines in the Star Multi-Agent Challenge (SMAC) environment.
arXiv Detail & Related papers (2021-06-22T10:21:00Z) - Graph Convolutional Value Decomposition in Multi-Agent Reinforcement
Learning [9.774412108791218]
We propose a novel framework for value function factorization in deep reinforcement learning.
In particular, we consider the team of agents as the set of nodes of a complete directed graph.
We introduce a mixing GNN module, which is responsible for i) factorizing the team state-action value function into individual per-agent observation-action value functions, and ii) explicit credit assignment to each agent in terms of fractions of the global team reward.
arXiv Detail & Related papers (2020-10-09T18:01:01Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep
Multi-Agent Reinforcement Learning [66.94149388181343]
We present a new version of a popular $Q$-learning algorithm for MARL.
We show that it can recover the optimal policy even with access to $Q*$.
We also demonstrate improved performance on predator-prey and challenging multi-agent StarCraft benchmark tasks.
arXiv Detail & Related papers (2020-06-18T18:34:50Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.