Transformers are Sample Efficient World Models
- URL: http://arxiv.org/abs/2209.00588v1
- Date: Thu, 1 Sep 2022 17:03:07 GMT
- Title: Transformers are Sample Efficient World Models
- Authors: Vincent Micheli, Eloi Alonso, Fran\c{c}ois Fleuret
- Abstract summary: We introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer.
With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games.
- Score: 1.9444242128493845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning agents are notoriously sample inefficient, which
considerably limits their application to real-world problems. Recently, many
model-based methods have been designed to address this issue, with learning in
the imagination of a world model being one of the most prominent approaches.
However, while virtually unlimited interaction with a simulated environment
sounds appealing, the world model has to be accurate over extended periods of
time. Motivated by the success of Transformers in sequence modeling tasks, we
introduce IRIS, a data-efficient agent that learns in a world model composed of
a discrete autoencoder and an autoregressive Transformer. With the equivalent
of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean
human normalized score of 1.046, and outperforms humans on 10 out of 26 games.
Our approach sets a new state of the art for methods without lookahead search,
and even surpasses MuZero. To foster future research on Transformers and world
models for sample-efficient reinforcement learning, we release our codebase at
https://github.com/eloialonso/iris.
Related papers
- Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient [9.519619751861333]
We propose a state space model (SSM) based world model based on Mamba.
It achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies.
This model is accessible and can be trained on an off-the-shelf laptop.
arXiv Detail & Related papers (2024-10-11T15:10:40Z) - Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining [49.730897226510095]
We introduce JOWA: Jointly-Reinforced World-Action model, an offline model-based RL agent pretrained on Atari games with 6 billion tokens data.
Our largest agent, with 150 million parameters, 78.9% human-level performance on pretrained games using only 10% subsampled offline data, outperforming existing state-of-the-art large-scale offline RL baselines by 31.6% on averange.
arXiv Detail & Related papers (2024-10-01T10:25:03Z) - Efficient World Models with Context-Aware Tokenization [22.84676306124071]
$Delta$-IRIS is a new agent with a world model architecture composed of a discrete autoencoder that encodes deltas between time steps.
In the Crafter benchmark, $Delta$-IRIS sets a new state of the art at multiple frame budgets, while being an order of magnitude faster to train than previous attention-based approaches.
arXiv Detail & Related papers (2024-06-27T16:54:12Z) - Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models [106.94827590977337]
We propose a novel world model for Multi-Agent RL (MARL) that learns decentralized local dynamics for scalability.
We also introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation.
Results on Starcraft Multi-Agent Challenge (SMAC) show that it outperforms strong model-free approaches and existing model-based methods in both sample efficiency and overall performance.
arXiv Detail & Related papers (2024-06-22T12:40:03Z) - Learning to Play Atari in a World of Tokens [4.880437151994464]
We introduce discrete abstract representations for transformer-based learning (DART)
We incorporate a transformer-decoder for auto-regressive world modeling and a transformer-encoder for learning behavior by attending to task-relevant cues in the discrete representation of the world model.
DART outperforms previous state-of-the-art methods that do not use look-ahead search on the Atari 100k sample efficiency benchmark with a median human-normalized score of 0.790 and beats humans in 9 out of 26 games.
arXiv Detail & Related papers (2024-06-03T14:25:29Z) - STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Transformer-based World Models Are Happy With 100k Interactions [0.4588028371034407]
We apply a transformer to real-world episodes in an autoregressive manner to build a sample-efficient world model.
The transformer allows our world model to access previous states directly, instead of viewing them through a compressed recurrent state.
By utilizing the Transformer-XL architecture, it is able to learn long-term dependencies while staying computationally efficient.
arXiv Detail & Related papers (2023-03-13T13:43:59Z) - Mastering Atari with Discrete World Models [61.7688353335468]
We introduce DreamerV2, a reinforcement learning agent that learns behaviors purely from predictions in the compact latent space of a powerful world model.
DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model.
arXiv Detail & Related papers (2020-10-05T17:52:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.