Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
- URL: http://arxiv.org/abs/2205.14953v1
- Date: Mon, 30 May 2022 09:39:45 GMT
- Title: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
- Authors: Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen,
Jun Wang and Yaodong Yang
- Abstract summary: We introduce a novel architecture named Multi-Agent Transformer (MAT)
MAT casts cooperative multi-agent reinforcement learning (MARL) into SM problems.
Central to MAT is an encoder-decoder architecture which transforms the joint policy search problem into a sequential decision making process.
- Score: 33.679936867612525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large sequence model (SM) such as GPT series and BERT has displayed
outstanding performance and generalization capabilities on vision, language,
and recently reinforcement learning tasks. A natural follow-up question is how
to abstract multi-agent decision making into an SM problem and benefit from the
prosperous development of SMs. In this paper, we introduce a novel architecture
named Multi-Agent Transformer (MAT) that effectively casts cooperative
multi-agent reinforcement learning (MARL) into SM problems wherein the task is
to map agents' observation sequence to agents' optimal action sequence. Our
goal is to build the bridge between MARL and SMs so that the modeling power of
modern sequence models can be unleashed for MARL. Central to our MAT is an
encoder-decoder architecture which leverages the multi-agent advantage
decomposition theorem to transform the joint policy search problem into a
sequential decision making process; this renders only linear time complexity
for multi-agent problems and, most importantly, endows MAT with monotonic
performance improvement guarantee. Unlike prior arts such as Decision
Transformer fit only pre-collected offline data, MAT is trained by online
trials and errors from the environment in an on-policy fashion. To validate
MAT, we conduct extensive experiments on StarCraftII, Multi-Agent MuJoCo,
Dexterous Hands Manipulation, and Google Research Football benchmarks. Results
demonstrate that MAT achieves superior performance and data efficiency compared
to strong baselines including MAPPO and HAPPO. Furthermore, we demonstrate that
MAT is an excellent few-short learner on unseen tasks regardless of changes in
the number of agents. See our project page at
https://sites.google.com/view/multi-agent-transformer.
Related papers
- SRMT: Shared Memory for Multi-agent Lifelong Pathfinding [4.192235624580332]
Multi-agent reinforcement learning (MARL) demonstrates significant progress in solving cooperative and competitive multi-agent problems.
One of the principal challenges in MARL is the need for explicit prediction of the agents' behavior to achieve cooperation.
We propose the Shared Recurrent Memory Transformer (SRMT) which extends memory transformers to multi-agent settings by pooling and globally broadcasting individual working memories.
arXiv Detail & Related papers (2025-01-22T20:08:53Z) - MALT: Improving Reasoning with Multi-Agent LLM Training [64.13803241218886]
We present a first step toward "Multi-agent LLM training" (MALT) on reasoning problems.
Our approach employs a sequential multi-agent setup with heterogeneous LLMs assigned specialized roles.
We evaluate our approach across MATH, GSM8k, and CQA, where MALT on Llama 3.1 8B models achieves relative improvements of 14.14%, 7.12%, and 9.40% respectively.
arXiv Detail & Related papers (2024-12-02T19:30:36Z) - Multi-Agent Reinforcement Learning with Selective State-Space Models [3.8177843038388892]
State-Space Models (SSMs) have gained attention due to their computational efficiency.
In this work, we investigate the use of Mamba, a recent SSM, in Multi-Agent Reinforcement Learning (MARL)
We introduce a modified version of MAT that incorporates standard and bi-directional Mamba blocks, as well as a novel "cross-attention" Mamba block.
arXiv Detail & Related papers (2024-10-25T08:32:21Z) - Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges [5.934258790280767]
Multimodal Large Language Models (MLLMs) harness comprehensive knowledge spanning text, images, and audio to adeptly tackle complex problems.
This study explores the ability of MLLMs in visually solving the Traveling Salesman Problem (TSP) and Multiple Traveling Salesman Problem (mTSP)
We introduce a novel approach employing multiple specialized agents within the MLLM framework, each dedicated to optimizing solutions for these challenges.
arXiv Detail & Related papers (2024-06-26T07:12:06Z) - Towards Robust Multi-Modal Reasoning via Model Selection [7.6621866737827045]
LLM serves as the "brain" of the agent, orchestrating multiple tools for collaborative multi-step task solving.
We propose the $textitM3$ framework as a plug-in with negligible runtime overhead at test-time.
Our experiments reveal that our framework enables dynamic model selection, considering both user inputs and subtask dependencies.
arXiv Detail & Related papers (2023-10-12T16:06:18Z) - MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
MADiff is a diffusion-based multi-agent learning framework.
It works as both a decentralized policy and a centralized controller.
Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z) - MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers [140.0479479231558]
In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER.
MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
arXiv Detail & Related papers (2022-12-15T13:57:07Z) - Off-Policy Correction For Multi-Agent Reinforcement Learning [9.599347559588216]
Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents.
Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically.
We propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting.
arXiv Detail & Related papers (2021-11-22T14:23:13Z) - MALib: A Parallel Framework for Population-based Multi-agent
Reinforcement Learning [61.28547338576706]
Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms.
We present MALib, a scalable and efficient computing framework for PB-MARL.
arXiv Detail & Related papers (2021-06-05T03:27:08Z) - Transfer Learning for Sequence Generation: from Single-source to
Multi-source [50.34044254589968]
We propose a two-stage finetuning method to alleviate the pretrain-finetune discrepancy and introduce a novel MSG model with a fine encoder to learn better representations in MSG tasks.
Our approach achieves new state-of-the-art results on the WMT17 APE task and multi-source translation task using the WMT14 test set.
arXiv Detail & Related papers (2021-05-31T09:12:38Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.