Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence
Model Conquers All StarCraftII Tasks
- URL: http://arxiv.org/abs/2112.02845v1
- Date: Mon, 6 Dec 2021 08:11:05 GMT
- Title: Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence
Model Conquers All StarCraftII Tasks
- Authors: Linghui Meng, Muning Wen, Yaodong Yang, Chenyang Le, Xiyun Li, Weinan
Zhang, Ying Wen, Haifeng Zhang, Jun Wang, Bo Xu
- Abstract summary: offline pre-training with online fine-tuning has never been studied, nor datasets or benchmarks for offline MARL research are available.
We propose the novel architecture of multi-agent decision transformer (MADT) for effective offline learning.
When evaluated on StarCraft II offline dataset, MADT demonstrates superior performance than state-of-the-art offline RL baselines.
- Score: 43.588686040547486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning leverages static datasets to learn optimal
policies with no necessity to access the environment. This technique is
desirable for multi-agent learning tasks due to the expensiveness of agents'
online interactions and the demanding number of samples during training. Yet,
in multi-agent reinforcement learning (MARL), the paradigm of offline
pre-training with online fine-tuning has never been studied, nor datasets or
benchmarks for offline MARL research are available. In this paper, we try to
answer the question of whether offline pre-training in MARL is able to learn
generalisable policy representations that can help improve the performance of
multiple downstream tasks. We start by introducing the first offline MARL
dataset with diverse quality levels based on the StarCraftII environment, and
then propose the novel architecture of multi-agent decision transformer (MADT)
for effective offline learning. MADT leverages Transformer's modelling ability
of temporal representations and integrates it with both offline and online MARL
tasks. A crucial benefit of MADT is that it learns generalisable policies that
can transfer between different types of agents under different task scenarios.
When evaluated on StarCraft II offline dataset, MADT demonstrates superior
performance than state-of-the-art offline RL baselines. When applied to online
tasks, the pre-trained MADT significantly improves sample efficiency, and
enjoys strong performance even in zero-shot cases. To our best knowledge, this
is the first work that studies and demonstrates the effectiveness of offline
pre-trained models in terms of sample efficiency and generalisability
enhancements in MARL.
Related papers
- Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration [40.346958259814514]
We propose a novel O2O MARL framework called Offline Value Function Memory with Sequential Exploration (OVMSE)
First, we introduce the Offline Value Function Memory (OVM) mechanism to compute target Q-values, preserving knowledge gained during offline training.
Second, we propose a decentralized Sequential Exploration (SE) strategy tailored for O2O MARL, which effectively utilizes the pre-trained offline policy for exploration.
arXiv Detail & Related papers (2024-10-25T10:24:19Z) - Hybrid Training for Enhanced Multi-task Generalization in Multi-agent Reinforcement Learning [7.6201940008534175]
HyGen is a novel hybrid MARL framework, which integrates online and offline learning to ensure both multi-task generalization and training efficiency.
We empirically demonstrate that our framework effectively extracts and refines general skills, yielding impressive generalization to unseen tasks.
arXiv Detail & Related papers (2024-08-24T12:37:03Z) - Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline
Pre-Training with Model Based Augmentation [59.899714450049494]
offline pre-training can produce sub-optimal policies and lead to degraded online reinforcement learning performance.
We propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective.
arXiv Detail & Related papers (2023-12-15T14:49:41Z) - Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local
Value Regularization [23.416448404647305]
OMIGA is a new offline m ulti-agent RL algorithm with implicit global-to-local v alue regularization.
We show that OMIGA achieves superior performance over the state-of-the-art offline MARL methods in almost all tasks.
arXiv Detail & Related papers (2023-07-21T14:37:54Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Offline Q-Learning on Diverse Multi-Task Data Both Scales And
Generalizes [100.69714600180895]
offline Q-learning algorithms exhibit strong performance that scales with model capacity.
We train a single policy on 40 games with near-human performance using up-to 80 million parameter networks.
Compared to return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal.
arXiv Detail & Related papers (2022-11-28T08:56:42Z) - Contextual Transformer for Offline Meta Reinforcement Learning [16.587320914107128]
We show how prompts can improve sequence modeling-based offline reinforcement learning ( offline RL) algorithms.
We propose prompt tuning for offline RL, where a context vector sequence istextuald with the input to guide the conditional policy generation.
We extend our framework to Meta-RL settings and propose Contextual Meta Transformer (CMT); CMT leverages the context among different tasks as the prompt to improve generalization on unseen tasks.
arXiv Detail & Related papers (2022-11-15T10:00:14Z) - A Workflow for Offline Model-Free Robotic Reinforcement Learning [117.07743713715291]
offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction.
We develop a practical workflow for using offline RL analogous to the relatively well-understood for supervised learning problems.
We demonstrate the efficacy of this workflow in producing effective policies without any online tuning.
arXiv Detail & Related papers (2021-09-22T16:03:29Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.