Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement
- URL: http://arxiv.org/abs/2410.11448v2
- Date: Thu, 24 Oct 2024 08:17:51 GMT
- Title: Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement
- Authors: Zhi Wang, Li Zhang, Wenhao Wu, Yuanheng Zhu, Dongbin Zhao, Chunlin Chen,
- Abstract summary: We propose Meta Decision Transformer (Meta-DT) to achieve efficient generalization in offline meta-RL.
We pretrain a context-aware world model to learn a compact task representation, and inject it as a contextual condition to guide task-oriented sequence generation.
We show that Meta-DT exhibits superior few and zero-shot generalization capacity compared to strong baselines.
- Score: 41.7426496795769
- License:
- Abstract: A longstanding goal of artificial general intelligence is highly capable generalists that can learn from diverse experiences and generalize to unseen tasks. The language and vision communities have seen remarkable progress toward this trend by scaling up transformer-based models trained on massive datasets, while reinforcement learning (RL) agents still suffer from poor generalization capacity under such paradigms. To tackle this challenge, we propose Meta Decision Transformer (Meta-DT), which leverages the sequential modeling ability of the transformer architecture and robust task representation learning via world model disentanglement to achieve efficient generalization in offline meta-RL. We pretrain a context-aware world model to learn a compact task representation, and inject it as a contextual condition to the causal transformer to guide task-oriented sequence generation. Then, we subtly utilize history trajectories generated by the meta-policy as a self-guided prompt to exploit the architectural inductive bias. We select the trajectory segment that yields the largest prediction error on the pretrained world model to construct the prompt, aiming to encode task-specific information complementary to the world model maximally. Notably, the proposed framework eliminates the requirement of any expert demonstration or domain knowledge at test time. Experimental results on MuJoCo and Meta-World benchmarks across various dataset types show that Meta-DT exhibits superior few and zero-shot generalization capacity compared to strong baselines while being more practical with fewer prerequisites. Our code is available at https://github.com/NJU-RL/Meta-DT.
Related papers
- Masked Generative Priors Improve World Models Sequence Modelling Capabilities [19.700020499490137]
Masked Generative Modelling has emerged as a more efficient and superior inductive bias for modelling.
GIT-STORM demonstrates substantial performance gains in RL tasks on the Atari 100k benchmark.
We apply Transformer-based World Models to continuous action environments for the first time, addressing a significant gap in prior research.
arXiv Detail & Related papers (2024-10-10T11:52:07Z) - Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models [106.94827590977337]
We propose a novel world model for Multi-Agent RL (MARL) that learns decentralized local dynamics for scalability.
We also introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation.
Results on Starcraft Multi-Agent Challenge (SMAC) show that it outperforms strong model-free approaches and existing model-based methods in both sample efficiency and overall performance.
arXiv Detail & Related papers (2024-06-22T12:40:03Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - ReCoRe: Regularized Contrastive Representation Learning of World Model [21.29132219042405]
We present a world model that learns invariant features using contrastive unsupervised learning and an intervention-invariant regularizer.
Our method outperforms current state-of-the-art model-based and model-free RL methods and significantly improves on out-of-distribution point navigation tasks evaluated on the iGibson benchmark.
arXiv Detail & Related papers (2023-12-14T15:53:07Z) - MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL [25.76141096396645]
We propose a task-oriented conditioned diffusion planner for offline meta-RL(MetaDiffuser)
The proposed framework enjoys the robustness to the quality of collected warm-start data from the testing task.
Experiment results on MuJoCo benchmarks show that MetaDiffuser outperforms other strong offline meta-RL baselines.
arXiv Detail & Related papers (2023-05-31T15:01:38Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Updater-Extractor Architecture for Inductive World State Representations [0.0]
We propose a transformer-based Updater-Extractor architecture and a training procedure that can work with sequences of arbitrary length.
We explicitly train the model to incorporate incoming information into its world state representation.
Empirically, we investigate the model performance on three different tasks, demonstrating its promise.
arXiv Detail & Related papers (2021-04-12T14:30:11Z) - MELD: Meta-Reinforcement Learning from Images via Latent State Models [109.1664295663325]
We develop an algorithm for meta-RL from images that performs inference in a latent state model to quickly acquire new skills.
MELD is the first meta-RL algorithm trained in a real-world robotic control setting from images.
arXiv Detail & Related papers (2020-10-26T23:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.