Cooperative Policy Learning with Pre-trained Heterogeneous Observation
Representations
- URL: http://arxiv.org/abs/2012.13099v1
- Date: Thu, 24 Dec 2020 04:52:29 GMT
- Title: Cooperative Policy Learning with Pre-trained Heterogeneous Observation
Representations
- Authors: Wenlei Shi, Xinran Wei, Jia Zhang, Xiaoyuan Ni, Arthur Jiang, Jiang
Bian, Tie-Yan Liu
- Abstract summary: We propose a new cooperative learning framework with pre-trained heterogeneous observation representations.
We employ an encoder-decoder based graph attention to learn the intricate interactions and heterogeneous representations.
- Score: 51.8796674904734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-agent reinforcement learning (MARL) has been increasingly explored to
learn the cooperative policy towards maximizing a certain global reward. Many
existing studies take advantage of graph neural networks (GNN) in MARL to
propagate critical collaborative information over the interaction graph, built
upon inter-connected agents. Nevertheless, the vanilla GNN approach yields
substantial defects in dealing with complex real-world scenarios since the
generic message passing mechanism is ineffective between heterogeneous vertices
and, moreover, simple message aggregation functions are incapable of accurately
modeling the combinational interactions from multiple neighbors. While adopting
complex GNN models with more informative message passing and aggregation
mechanisms can obviously benefit heterogeneous vertex representations and
cooperative policy learning, it could, on the other hand, increase the training
difficulty of MARL and demand more intense and direct reward signals compared
to the original global reward. To address these challenges, we propose a new
cooperative learning framework with pre-trained heterogeneous observation
representations. Particularly, we employ an encoder-decoder based graph
attention to learn the intricate interactions and heterogeneous representations
that can be more easily leveraged by MARL. Moreover, we design a pre-training
with local actor-critic algorithm to ease the difficulty in cooperative policy
learning. Extensive experiments over real-world scenarios demonstrate that our
new approach can significantly outperform existing MARL baselines as well as
operational research solutions that are widely-used in industry.
Related papers
- Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank [52.831993899183416]
We introduce a structural assumption -- the interaction rank -- and establish that functions with low interaction rank are significantly more robust to distribution shift compared to general ones.
We demonstrate that utilizing function classes with low interaction rank, when combined with regularization and no-regret learning, admits decentralized, computationally and statistically efficient learning in offline MARL.
arXiv Detail & Related papers (2024-10-01T22:16:22Z) - Coordination Failure in Cooperative Offline MARL [3.623224034411137]
We focus on coordination failure and investigate the role of joint actions in multi-agent policy gradients with offline data.
By using two-player games as an analytical tool, we demonstrate a simple yet overlooked failure mode of BRUD-based algorithms.
We propose an approach to mitigate such failure, by prioritising samples from the dataset based on joint-action similarity.
arXiv Detail & Related papers (2024-07-01T14:51:29Z) - Scaling Large-Language-Model-based Multi-Agent Collaboration [75.5241464256688]
Pioneering advancements in large language model-powered agents have underscored the design pattern of multi-agent collaboration.
Inspired by the neural scaling law, this study investigates whether a similar principle applies to increasing agents in multi-agent collaboration.
arXiv Detail & Related papers (2024-06-11T11:02:04Z) - Context-Aware Bayesian Network Actor-Critic Methods for Cooperative
Multi-Agent Reinforcement Learning [7.784991832712813]
We introduce a Bayesian network to inaugurate correlations between agents' action selections in their joint policy.
We develop practical algorithms to learn the context-aware Bayesian network policies.
Empirical results on a range of MARL benchmarks show the benefits of our approach.
arXiv Detail & Related papers (2023-06-02T21:22:27Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Personalized Decentralized Multi-Task Learning Over Dynamic
Communication Graphs [59.96266198512243]
We propose a decentralized and federated learning algorithm for tasks that are positively and negatively correlated.
Our algorithm uses gradients to calculate the correlations among tasks automatically, and dynamically adjusts the communication graph to connect mutually beneficial tasks and isolate those that may negatively impact each other.
We conduct experiments on a synthetic Gaussian dataset and a large-scale celebrity attributes (CelebA) dataset.
arXiv Detail & Related papers (2022-12-21T18:58:24Z) - Social Recommendation with Self-Supervised Metagraph Informax Network [21.41026069530997]
We propose a Self-Supervised Metagraph Infor-max Network (SMIN) which investigates the potential of incorporating social- and knowledge-aware relational structures into the user preference representation for recommendation.
To inject high-order collaborative signals, we generalize the mutual information learning paradigm under the self-supervised graph-based collaborative filtering.
Experimental results on several real-world datasets demonstrate the effectiveness of our SMIN model over various state-of-the-art recommendation methods.
arXiv Detail & Related papers (2021-10-08T08:18:37Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - Soft Hierarchical Graph Recurrent Networks for Many-Agent Partially
Observable Environments [9.067091068256747]
We propose a novel network structure called hierarchical graph recurrent network(HGRN) for multi-agent cooperation under partial observability.
Based on the above technologies, we proposed a value-based MADRL algorithm called Soft-HGRN and its actor-critic variant named SAC-HRGN.
arXiv Detail & Related papers (2021-09-05T09:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.