Transformer-based World Models Are Happy With 100k Interactions
- URL: http://arxiv.org/abs/2303.07109v1
- Date: Mon, 13 Mar 2023 13:43:59 GMT
- Title: Transformer-based World Models Are Happy With 100k Interactions
- Authors: Jan Robine, Marc H\"oftmann, Tobias Uelwer, Stefan Harmeling
- Abstract summary: We apply a transformer to real-world episodes in an autoregressive manner to build a sample-efficient world model.
The transformer allows our world model to access previous states directly, instead of viewing them through a compressed recurrent state.
By utilizing the Transformer-XL architecture, it is able to learn long-term dependencies while staying computationally efficient.
- Score: 0.4588028371034407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks have been successful in many reinforcement learning
settings. However, compared to human learners they are overly data hungry. To
build a sample-efficient world model, we apply a transformer to real-world
episodes in an autoregressive manner: not only the compact latent states and
the taken actions but also the experienced or predicted rewards are fed into
the transformer, so that it can attend flexibly to all three modalities at
different time steps. The transformer allows our world model to access previous
states directly, instead of viewing them through a compressed recurrent state.
By utilizing the Transformer-XL architecture, it is able to learn long-term
dependencies while staying computationally efficient. Our transformer-based
world model (TWM) generates meaningful, new experience, which is used to train
a policy that outperforms previous model-free and model-based reinforcement
learning algorithms on the Atari 100k benchmark.
Related papers
- Transformer Explainer: Interactive Learning of Text-Generative Models [65.91049787390692]
Transformer Explainer is an interactive visualization tool designed for non-experts to learn about Transformers through the GPT-2 model.
It runs a live GPT-2 instance locally in the user's browser, empowering users to experiment with their own input and observe in real-time how the internal components and parameters of the Transformer work together.
arXiv Detail & Related papers (2024-08-08T17:49:07Z) - Learning to Play Atari in a World of Tokens [4.880437151994464]
We introduce discrete abstract representations for transformer-based learning (DART)
We incorporate a transformer-decoder for auto-regressive world modeling and a transformer-encoder for learning behavior by attending to task-relevant cues in the discrete representation of the world model.
DART outperforms previous state-of-the-art methods that do not use look-ahead search on the Atari 100k sample efficiency benchmark with a median human-normalized score of 0.790 and beats humans in 9 out of 26 games.
arXiv Detail & Related papers (2024-06-03T14:25:29Z) - STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Learning to Grow Pretrained Models for Efficient Transformer Training [72.20676008625641]
We learn to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model.
Experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch.
arXiv Detail & Related papers (2023-03-02T05:21:18Z) - Transformers are Sample Efficient World Models [1.9444242128493845]
We introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer.
With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games.
arXiv Detail & Related papers (2022-09-01T17:03:07Z) - Shifted Chunk Transformer for Spatio-Temporal Representational Learning [24.361059477031162]
We construct a shifted chunk Transformer with pure self-attention blocks.
This Transformer can learn hierarchical-temporal features from a tiny patch to a global video clip.
It outperforms state-of-the-art approaches on Kinetics, Kinetics-600, UCF101, and HMDB51.
arXiv Detail & Related papers (2021-08-26T04:34:33Z) - Deep Imitation Learning for Bimanual Robotic Manipulation [70.56142804957187]
We present a deep imitation learning framework for robotic bimanual manipulation.
A core challenge is to generalize the manipulation skills to objects in different locations.
We propose to (i) decompose the multi-modal dynamics into elemental movement primitives, (ii) parameterize each primitive using a recurrent graph neural network to capture interactions, and (iii) integrate a high-level planner that composes primitives sequentially and a low-level controller to combine primitive dynamics and inverse kinematics control.
arXiv Detail & Related papers (2020-10-11T01:40:03Z) - Conformer: Convolution-augmented Transformer for Speech Recognition [60.119604551507805]
Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR)
We propose the convolution-augmented transformer for speech recognition, named Conformer.
On the widely used LibriSpeech benchmark, our model achieves WER of 2.1%/4.3% without using a language model and 1.9%/3.9% with an external language model on test/testother.
arXiv Detail & Related papers (2020-05-16T20:56:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.