Self-Clustering Hierarchical Multi-Agent Reinforcement Learning with Extensible Cooperation Graph
- URL: http://arxiv.org/abs/2403.18056v1
- Date: Tue, 26 Mar 2024 19:19:16 GMT
- Title: Self-Clustering Hierarchical Multi-Agent Reinforcement Learning with Extensible Cooperation Graph
- Authors: Qingxu Fu, Tenghai Qiu, Jianqiang Yi, Zhiqiang Pu, Xiaolin Ai,
- Abstract summary: This paper proposes a novel hierarchical MARL model called Hierarchical Cooperation Graph Learning (HCGL)
HCGL has three components: a dynamic Cooperation Graph (ECG) for achieving self-clustering cooperation; a group of graph operators for adjusting the topology of ECG; and an MARL for training these graph operators.
In our experiments, the HCGL model has shown outstanding performance in multi-agent benchmarks with sparse rewards.
- Score: 9.303181273699417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-Agent Reinforcement Learning (MARL) has been successful in solving many cooperative challenges. However, classic non-hierarchical MARL algorithms still cannot address various complex multi-agent problems that require hierarchical cooperative behaviors. The cooperative knowledge and policies learned in non-hierarchical algorithms are implicit and not interpretable, thereby restricting the integration of existing knowledge. This paper proposes a novel hierarchical MARL model called Hierarchical Cooperation Graph Learning (HCGL) for solving general multi-agent problems. HCGL has three components: a dynamic Extensible Cooperation Graph (ECG) for achieving self-clustering cooperation; a group of graph operators for adjusting the topology of ECG; and an MARL optimizer for training these graph operators. HCGL's key distinction from other MARL models is that the behaviors of agents are guided by the topology of ECG instead of policy neural networks. ECG is a three-layer graph consisting of an agent node layer, a cluster node layer, and a target node layer. To manipulate the ECG topology in response to changing environmental conditions, four graph operators are trained to adjust the edge connections of ECG dynamically. The hierarchical feature of ECG provides a unique approach to merge primitive actions (actions executed by the agents) and cooperative actions (actions executed by the clusters) into a unified action space, allowing us to integrate fundamental cooperative knowledge into an extensible interface. In our experiments, the HCGL model has shown outstanding performance in multi-agent benchmarks with sparse rewards. We also verify that HCGL can easily be transferred to large-scale scenarios with high zero-shot transfer success rates.
Related papers
- Dynamic and Textual Graph Generation Via Large-Scale LLM-based Agent Simulation [70.60461609393779]
GraphAgent-Generator (GAG) is a novel simulation-based framework for dynamic graph generation.
Our framework effectively replicates seven macro-level structural characteristics in established network science theories.
It supports generating graphs with up to nearly 100,000 nodes or 10 million edges, with a minimum speed-up of 90.4%.
arXiv Detail & Related papers (2024-10-13T12:57:08Z) - Causality is all you need [63.10680366545293]
Causal Graph Routing (CGR) is an integrated causal scheme relying entirely on the intervention mechanisms to reveal the cause-effect forces hidden in data.
CGR can surpass the current state-of-the-art methods on both Visual Question Answer and Long Document Classification tasks.
arXiv Detail & Related papers (2023-11-21T02:53:40Z) - Generative and Contrastive Paradigms Are Complementary for Graph
Self-Supervised Learning [56.45977379288308]
Masked autoencoder (MAE) learns to reconstruct masked graph edges or node features.
Contrastive Learning (CL) maximizes the similarity between augmented views of the same graph.
We propose graph contrastive masked autoencoder (GCMAE) framework to unify MAE and CL.
arXiv Detail & Related papers (2023-10-24T05:06:06Z) - Non-Linear Coordination Graphs [22.29517436920317]
Coordination graphs (CGs) represent a higher-order decomposition by incorporating pairwise payoff functions.
We propose the first non-linear coordination graph by extending CG value decomposition beyond the linear case.
We find that our method can achieve superior performance on challenging multi-agent coordination tasks like MACO.
arXiv Detail & Related papers (2022-10-26T18:11:31Z) - A Cooperation Graph Approach for Multiagent Sparse Reward Reinforcement
Learning [7.2972297703292135]
Multiagent reinforcement learning (MARL) can solve complex cooperative tasks.
In this paper, we design a graph network called Cooperation Graph (CG)
We propose a Cooperation Graph Multiagent Reinforcement Learning (CG-MARL) algorithm, which can efficiently deal with the sparse reward problem in multiagent tasks.
arXiv Detail & Related papers (2022-08-05T06:32:16Z) - Graph Representation Learning via Contrasting Cluster Assignments [57.87743170674533]
We propose a novel unsupervised graph representation model by contrasting cluster assignments, called as GRCCA.
It is motivated to make good use of local and global information synthetically through combining clustering algorithms and contrastive learning.
GRCCA has strong competitiveness in most tasks.
arXiv Detail & Related papers (2021-12-15T07:28:58Z) - Deep Attention-guided Graph Clustering with Dual Self-supervision [49.040136530379094]
We propose a novel method, namely deep attention-guided graph clustering with dual self-supervision (DAGC)
We develop a dual self-supervision solution consisting of a soft self-supervision strategy with a triplet Kullback-Leibler divergence loss and a hard self-supervision strategy with a pseudo supervision loss.
Our method consistently outperforms state-of-the-art methods on six benchmark datasets.
arXiv Detail & Related papers (2021-11-10T06:53:03Z) - Soft Hierarchical Graph Recurrent Networks for Many-Agent Partially
Observable Environments [9.067091068256747]
We propose a novel network structure called hierarchical graph recurrent network(HGRN) for multi-agent cooperation under partial observability.
Based on the above technologies, we proposed a value-based MADRL algorithm called Soft-HGRN and its actor-critic variant named SAC-HRGN.
arXiv Detail & Related papers (2021-09-05T09:51:25Z) - Cooperative Policy Learning with Pre-trained Heterogeneous Observation
Representations [51.8796674904734]
We propose a new cooperative learning framework with pre-trained heterogeneous observation representations.
We employ an encoder-decoder based graph attention to learn the intricate interactions and heterogeneous representations.
arXiv Detail & Related papers (2020-12-24T04:52:29Z) - Graph Convolutional Value Decomposition in Multi-Agent Reinforcement
Learning [9.774412108791218]
We propose a novel framework for value function factorization in deep reinforcement learning.
In particular, we consider the team of agents as the set of nodes of a complete directed graph.
We introduce a mixing GNN module, which is responsible for i) factorizing the team state-action value function into individual per-agent observation-action value functions, and ii) explicit credit assignment to each agent in terms of fractions of the global team reward.
arXiv Detail & Related papers (2020-10-09T18:01:01Z) - Deep Implicit Coordination Graphs for Multi-agent Reinforcement Learning [36.844163371495995]
This paper introduces the deep implicit coordination graph (DICG) architecture for such scenarios.
DICG consists of a module for inferring the dynamic coordination graph structure which is then used by a graph neural network based module to learn to implicitly reason about the joint actions or values.
We demonstrate that DICG solves the relative overgeneralization pathology in predatory-prey tasks as well as outperforms various MARL baselines on the challenging StarCraft II Multi-agent Challenge (SMAC) and traffic junction environments.
arXiv Detail & Related papers (2020-06-19T23:41:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.