TransfQMix: Transformers for Leveraging the Graph Structure of
Multi-Agent Reinforcement Learning Problems
- URL: http://arxiv.org/abs/2301.05334v1
- Date: Fri, 13 Jan 2023 00:07:08 GMT
- Title: TransfQMix: Transformers for Leveraging the Graph Structure of
Multi-Agent Reinforcement Learning Problems
- Authors: Matteo Gallici, Mario Martin, Ivan Masmitja
- Abstract summary: We present TransfQMix, a new approach that uses transformers to leverage a latent graph structure and learn better coordination policies.
Our transformer Q-mixer learns a monotonic mixing-function from a larger graph that includes the internal and external states of the agents.
We report TransfQMix's performances in the Spread and StarCraft II environments.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Coordination is one of the most difficult aspects of multi-agent
reinforcement learning (MARL). One reason is that agents normally choose their
actions independently of one another. In order to see coordination strategies
emerging from the combination of independent policies, the recent research has
focused on the use of a centralized function (CF) that learns each agent's
contribution to the team reward. However, the structure in which the
environment is presented to the agents and to the CF is typically overlooked.
We have observed that the features used to describe the coordination problem
can be represented as vertex features of a latent graph structure. Here, we
present TransfQMix, a new approach that uses transformers to leverage this
latent structure and learn better coordination policies. Our transformer agents
perform a graph reasoning over the state of the observable entities. Our
transformer Q-mixer learns a monotonic mixing-function from a larger graph that
includes the internal and external states of the agents. TransfQMix is designed
to be entirely transferable, meaning that same parameters can be used to
control and train larger or smaller teams of agents. This enables to deploy
promising approaches to save training time and derive general policies in MARL,
such as transfer learning, zero-shot transfer, and curriculum learning. We
report TransfQMix's performances in the Spread and StarCraft II environments.
In both settings, it outperforms state-of-the-art Q-Learning models, and it
demonstrates effectiveness in solving problems that other methods can not
solve.
Related papers
- MFC-EQ: Mean-Field Control with Envelope Q-Learning for Moving Decentralized Agents in Formation [1.770056709115081]
Moving Agents in Formation (MAiF) is a variant of Multi-Agent Path Finding.
MFC-EQ is a scalable and adaptable learning framework for this bi-objective multi-agent problem.
arXiv Detail & Related papers (2024-10-15T20:59:47Z) - Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation
Learning [13.060023718506917]
imitation learning (IL) is a problem of learning to mimic expert behaviors from demonstrations in cooperative multi-agent systems.
We introduce a novel multi-agent IL algorithm designed to address these challenges.
Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions.
arXiv Detail & Related papers (2023-10-10T17:11:20Z) - FedYolo: Augmenting Federated Learning with Pretrained Transformers [61.56476056444933]
In this work, we investigate pretrained transformers (PTF) to achieve on-device learning goals.
We show that larger scale shrinks the accuracy gaps between alternative approaches and improves robustness.
Finally, it enables clients to solve multiple unrelated tasks simultaneously using a single PTF.
arXiv Detail & Related papers (2023-07-10T21:08:52Z) - MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
Diffusion model (DM) recently achieved huge success in various scenarios including offline reinforcement learning.
We propose MADiff, a novel generative multi-agent learning framework to tackle this problem.
Our experiments show the superior performance of MADiff compared to baseline algorithms in a wide range of multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z) - MA-Dreamer: Coordination and communication through shared imagination [5.253168177256072]
We present MA-Dreamer, a model-based method that uses both agent-centric and global differentiable models of the environment.
Our experiments show that in long-term speaker-listener tasks and in cooperative games with strong partial-observability, MA-Dreamer finds a solution that makes effective use of coordination.
arXiv Detail & Related papers (2022-04-10T13:54:26Z) - Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints.
We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z) - Graph Convolutional Value Decomposition in Multi-Agent Reinforcement
Learning [9.774412108791218]
We propose a novel framework for value function factorization in deep reinforcement learning.
In particular, we consider the team of agents as the set of nodes of a complete directed graph.
We introduce a mixing GNN module, which is responsible for i) factorizing the team state-action value function into individual per-agent observation-action value functions, and ii) explicit credit assignment to each agent in terms of fractions of the global team reward.
arXiv Detail & Related papers (2020-10-09T18:01:01Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Hierarchically Decoupled Imitation for Morphological Transfer [95.19299356298876]
We show that transferring learned information from a morphologically simpler agent can massively improve the sample efficiency of a more complex one.
First, we show that incentivizing a complex agent's low-level to imitate a simpler agent's low-level significantly improves zero-shot high-level transfer.
Second, we show that KL-regularized training of the high level stabilizes learning and prevents mode-collapse.
arXiv Detail & Related papers (2020-03-03T18:56:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.