UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers
- URL: http://arxiv.org/abs/2101.08001v3
- Date: Sun, 7 Feb 2021 10:28:41 GMT
- Title: UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers
- Authors: Siyi Hu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang
- Abstract summary: We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
- Score: 108.92194081987967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in multi-agent reinforcement learning have been largely
limited in training one model from scratch for every new task. The limitation
is due to the restricted model architecture related to fixed input and output
dimensions. This hinders the experience accumulation and transfer of the
learned agent over tasks with diverse levels of difficulty (e.g. 3 vs 3 or 5 vs
6 multi-agent games). In this paper, we make the first attempt to explore a
universal multi-agent reinforcement learning pipeline, designing one single
architecture to fit tasks with the requirement of different observation and
action configurations. Unlike previous RNN-based models, we utilize a
transformer-based model to generate a flexible policy by decoupling the policy
distribution from the intertwined input observation with an importance weight
measured by the merits of the self-attention mechanism. Compared to a standard
transformer block, the proposed model, named as Universal Policy Decoupling
Transformer (UPDeT), further relaxes the action restriction and makes the
multi-agent task's decision process more explainable. UPDeT is general enough
to be plugged into any multi-agent reinforcement learning pipeline and equip
them with strong generalization abilities that enables the handling of multiple
tasks at a time. Extensive experiments on large-scale SMAC multi-agent
competitive games demonstrate that the proposed UPDeT-based multi-agent
reinforcement learning achieves significant results relative to
state-of-the-art approaches, demonstrating advantageous transfer capability in
terms of both performance and training speed (10 times faster).
Related papers
- Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments [3.0284592792243794]
Bottom Up Network (BUN) treats the collective of multi-agents as a unified entity.
Our empirical evaluations across a variety of cooperative multi-agent scenarios, including tasks such as cooperative navigation and traffic control, consistently demonstrate BUN's superiority over baseline methods with substantially reduced computational costs.
arXiv Detail & Related papers (2024-10-03T14:25:02Z) - Dynamic Transformer Architecture for Continual Learning of Multimodal
Tasks [27.59758964060561]
Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities.
Continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent.
We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language.
arXiv Detail & Related papers (2024-01-27T03:03:30Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Effective Multi-Agent Deep Reinforcement Learning Control with Relative
Entropy Regularization [6.441951360534903]
Multi-Agent Continuous Dynamic Policy Gradient (MACDPP) was proposed to tackle the issues of limited capability and sample efficiency in various scenarios controlled by multiple agents.
It alleviates the inconsistency of multiple agents' policy updates by introducing the relative entropy regularization to the Training with Decentralized Execution (CTDE) framework with the Actor-Critic (AC) structure.
arXiv Detail & Related papers (2023-09-26T07:38:19Z) - FedYolo: Augmenting Federated Learning with Pretrained Transformers [61.56476056444933]
In this work, we investigate pretrained transformers (PTF) to achieve on-device learning goals.
We show that larger scale shrinks the accuracy gaps between alternative approaches and improves robustness.
Finally, it enables clients to solve multiple unrelated tasks simultaneously using a single PTF.
arXiv Detail & Related papers (2023-07-10T21:08:52Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Efficient Multimodal Fusion via Interactive Prompting [62.08292938484994]
Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era.
We propose an efficient and flexible multimodal fusion method, namely PMF, tailored for fusing unimodally pre-trained transformers.
arXiv Detail & Related papers (2023-04-13T07:31:51Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Low-level Pose Control of Tilting Multirotor for Wall Perching Tasks
Using Reinforcement Learning [2.5903488573278284]
We propose a novel reinforcement learning-based method to control a tilting multirotor on real-world applications.
Our proposed method shows robust controllability by overcoming the complex dynamics of tilting multirotors.
arXiv Detail & Related papers (2021-08-11T21:39:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.