Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
- URL: http://arxiv.org/abs/2205.14953v1
- Date: Mon, 30 May 2022 09:39:45 GMT
- Title: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
- Authors: Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen,
Jun Wang and Yaodong Yang
- Abstract summary: We introduce a novel architecture named Multi-Agent Transformer (MAT)
MAT casts cooperative multi-agent reinforcement learning (MARL) into SM problems.
Central to MAT is an encoder-decoder architecture which transforms the joint policy search problem into a sequential decision making process.
- Score: 33.679936867612525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large sequence model (SM) such as GPT series and BERT has displayed
outstanding performance and generalization capabilities on vision, language,
and recently reinforcement learning tasks. A natural follow-up question is how
to abstract multi-agent decision making into an SM problem and benefit from the
prosperous development of SMs. In this paper, we introduce a novel architecture
named Multi-Agent Transformer (MAT) that effectively casts cooperative
multi-agent reinforcement learning (MARL) into SM problems wherein the task is
to map agents' observation sequence to agents' optimal action sequence. Our
goal is to build the bridge between MARL and SMs so that the modeling power of
modern sequence models can be unleashed for MARL. Central to our MAT is an
encoder-decoder architecture which leverages the multi-agent advantage
decomposition theorem to transform the joint policy search problem into a
sequential decision making process; this renders only linear time complexity
for multi-agent problems and, most importantly, endows MAT with monotonic
performance improvement guarantee. Unlike prior arts such as Decision
Transformer fit only pre-collected offline data, MAT is trained by online
trials and errors from the environment in an on-policy fashion. To validate
MAT, we conduct extensive experiments on StarCraftII, Multi-Agent MuJoCo,
Dexterous Hands Manipulation, and Google Research Football benchmarks. Results
demonstrate that MAT achieves superior performance and data efficiency compared
to strong baselines including MAPPO and HAPPO. Furthermore, we demonstrate that
MAT is an excellent few-short learner on unseen tasks regardless of changes in
the number of agents. See our project page at
https://sites.google.com/view/multi-agent-transformer.
Related papers
- Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks [39.084974125007165]
We introduce Magentic-One, a high-performing open-source agentic system for solving complex tasks.
Magentic-One uses a multi-agent architecture where a lead agent, the Orchestrator, tracks progress, and re-plans to recover from errors.
We show that Magentic-One achieves statistically competitive performance to the state-of-the-art on three diverse and challenging agentic benchmarks.
arXiv Detail & Related papers (2024-11-07T06:36:19Z) - Multi-Agent Reinforcement Learning with Selective State-Space Models [3.8177843038388892]
State-Space Models (SSMs) have gained attention due to their computational efficiency.
In this work, we investigate the use of Mamba, a recent SSM, in Multi-Agent Reinforcement Learning (MARL)
We introduce a modified version of MAT that incorporates standard and bi-directional Mamba blocks, as well as a novel "cross-attention" Mamba block.
arXiv Detail & Related papers (2024-10-25T08:32:21Z) - Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges [5.934258790280767]
Multimodal Large Language Models (MLLMs) harness comprehensive knowledge spanning text, images, and audio to adeptly tackle complex problems.
This study explores the ability of MLLMs in visually solving the Traveling Salesman Problem (TSP) and Multiple Traveling Salesman Problem (mTSP)
We introduce a novel approach employing multiple specialized agents within the MLLM framework, each dedicated to optimizing solutions for these challenges.
arXiv Detail & Related papers (2024-06-26T07:12:06Z) - Towards Robust Multi-Modal Reasoning via Model Selection [7.6621866737827045]
LLM serves as the "brain" of the agent, orchestrating multiple tools for collaborative multi-step task solving.
We propose the $textitM3$ framework as a plug-in with negligible runtime overhead at test-time.
Our experiments reveal that our framework enables dynamic model selection, considering both user inputs and subtask dependencies.
arXiv Detail & Related papers (2023-10-12T16:06:18Z) - Low-Rank Multitask Learning based on Tensorized SVMs and LSSVMs [65.42104819071444]
Multitask learning (MTL) leverages task-relatedness to enhance performance.
We employ high-order tensors, with each mode corresponding to a task index, to naturally represent tasks referenced by multiple indices.
We propose a general framework of low-rank MTL methods with tensorized support vector machines (SVMs) and least square support vector machines (LSSVMs)
arXiv Detail & Related papers (2023-08-30T14:28:26Z) - MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
Diffusion model (DM) recently achieved huge success in various scenarios including offline reinforcement learning.
We propose MADiff, a novel generative multi-agent learning framework to tackle this problem.
Our experiments show the superior performance of MADiff compared to baseline algorithms in a wide range of multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z) - MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers [140.0479479231558]
In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER.
MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
arXiv Detail & Related papers (2022-12-15T13:57:07Z) - Off-Policy Correction For Multi-Agent Reinforcement Learning [9.599347559588216]
Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents.
Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically.
We propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting.
arXiv Detail & Related papers (2021-11-22T14:23:13Z) - MALib: A Parallel Framework for Population-based Multi-agent
Reinforcement Learning [61.28547338576706]
Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms.
We present MALib, a scalable and efficient computing framework for PB-MARL.
arXiv Detail & Related papers (2021-06-05T03:27:08Z) - Transfer Learning for Sequence Generation: from Single-source to
Multi-source [50.34044254589968]
We propose a two-stage finetuning method to alleviate the pretrain-finetune discrepancy and introduce a novel MSG model with a fine encoder to learn better representations in MSG tasks.
Our approach achieves new state-of-the-art results on the WMT17 APE task and multi-source translation task using the WMT14 test set.
arXiv Detail & Related papers (2021-05-31T09:12:38Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.