STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning
- URL: http://arxiv.org/abs/2310.09615v1
- Date: Sat, 14 Oct 2023 16:42:02 GMT
- Title: STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning
- Authors: Weipu Zhang, Gang Wang, Jian Sun, Yetian Yuan, Gao Huang
- Abstract summary: Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
- Score: 82.03481509373037
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recently, model-based reinforcement learning algorithms have demonstrated
remarkable efficacy in visual input environments. These approaches begin by
constructing a parameterized simulation world model of the real environment
through self-supervised learning. By leveraging the imagination of the world
model, the agent's policy is enhanced without the constraints of sampling from
the real environment. The performance of these algorithms heavily relies on the
sequence modeling and generation capabilities of the world model. However,
constructing a perfectly accurate model of a complex unknown environment is
nearly impossible. Discrepancies between the model and reality may cause the
agent to pursue virtual goals, resulting in subpar performance in the real
environment. Introducing random noise into model-based reinforcement learning
has been proven beneficial. In this work, we introduce Stochastic
Transformer-based wORld Model (STORM), an efficient world model architecture
that combines the strong sequence modeling and generation capabilities of
Transformers with the stochastic nature of variational autoencoders. STORM
achieves a mean human performance of $126.7\%$ on the Atari $100$k benchmark,
setting a new record among state-of-the-art methods that do not employ
lookahead search techniques. Moreover, training an agent with $1.85$ hours of
real-time interaction experience on a single NVIDIA GeForce RTX 3090 graphics
card requires only $4.3$ hours, showcasing improved efficiency compared to
previous methodologies.
Related papers
- Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient [9.519619751861333]
We propose a state space model (SSM) based world model based on Mamba.
It achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies.
This model is accessible and can be trained on an off-the-shelf laptop.
arXiv Detail & Related papers (2024-10-11T15:10:40Z) - Masked Generative Priors Improve World Models Sequence Modelling Capabilities [19.700020499490137]
Masked Generative Modelling has emerged as a more efficient and superior inductive bias for modelling.
GIT-STORM demonstrates substantial performance gains in RL tasks on the Atari 100k benchmark.
We apply Transformer-based World Models to continuous action environments for the first time, addressing a significant gap in prior research.
arXiv Detail & Related papers (2024-10-10T11:52:07Z) - COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically
for Model-Based RL [50.385005413810084]
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration.
$textttCOPlanner$ is a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem.
arXiv Detail & Related papers (2023-10-11T06:10:07Z) - Transformers are Sample Efficient World Models [1.9444242128493845]
We introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer.
With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games.
arXiv Detail & Related papers (2022-09-01T17:03:07Z) - Real-time Neural-MPC: Deep Learning Model Predictive Control for
Quadrotors and Agile Robotic Platforms [59.03426963238452]
We present Real-time Neural MPC, a framework to efficiently integrate large, complex neural network architectures as dynamics models within a model-predictive control pipeline.
We show the feasibility of our framework on real-world problems by reducing the positional tracking error by up to 82% when compared to state-of-the-art MPC approaches without neural network dynamics.
arXiv Detail & Related papers (2022-03-15T09:38:15Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow [14.422129911404472]
Bellman aims to fill this gap and introduces the first thoroughly designed and tested model-based RL toolbox.
Our modular approach enables to combine a wide range of environment models with generic model-based agent classes that recover state-of-the-art algorithms.
arXiv Detail & Related papers (2021-03-26T11:32:27Z) - Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon.
We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z) - Smaller World Models for Reinforcement Learning [0.5156484100374059]
We propose a new neural network architecture for world models based on a vector quantized-variational autoencoder (VQ-VAE)
A model-free PPO agent is trained purely on simulated experience from the world model.
We show that we reach comparable performance to their SimPLe algorithm, while our model is significantly smaller.
arXiv Detail & Related papers (2020-10-12T15:02:41Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.