Mastering Atari with Discrete World Models
- URL: http://arxiv.org/abs/2010.02193v4
- Date: Sat, 12 Feb 2022 20:01:53 GMT
- Title: Mastering Atari with Discrete World Models
- Authors: Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba
- Abstract summary: We introduce DreamerV2, a reinforcement learning agent that learns behaviors purely from predictions in the compact latent space of a powerful world model.
DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model.
- Score: 61.7688353335468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intelligent agents need to generalize from past experience to achieve goals
in complex environments. World models facilitate such generalization and allow
learning behaviors from imagined outcomes to increase sample-efficiency. While
learning world models from image inputs has recently become feasible for some
tasks, modeling Atari games accurately enough to derive successful behaviors
has remained an open challenge for many years. We introduce DreamerV2, a
reinforcement learning agent that learns behaviors purely from predictions in
the compact latent space of a powerful world model. The world model uses
discrete representations and is trained separately from the policy. DreamerV2
constitutes the first agent that achieves human-level performance on the Atari
benchmark of 55 tasks by learning behaviors inside a separately trained world
model. With the same computational budget and wall-clock time, Dreamer V2
reaches 200M frames and surpasses the final performance of the top single-GPU
agents IQN and Rainbow. DreamerV2 is also applicable to tasks with continuous
actions, where it learns an accurate world model of a complex humanoid robot
and solves stand-up and walking from only pixel inputs.
Related papers
- Learning to Play Atari in a World of Tokens [4.880437151994464]
We introduce discrete abstract representations for transformer-based learning (DART)
We incorporate a transformer-decoder for auto-regressive world modeling and a transformer-encoder for learning behavior by attending to task-relevant cues in the discrete representation of the world model.
DART outperforms previous state-of-the-art methods that do not use look-ahead search on the Atari 100k sample efficiency benchmark with a median human-normalized score of 0.790 and beats humans in 9 out of 26 games.
arXiv Detail & Related papers (2024-06-03T14:25:29Z) - Diffusion for World Modeling: Visual Details Matter in Atari [22.915802013352465]
We introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model.
We analyze the key design choices that are required to make diffusion suitable for world modeling, and demonstrate how improved visual details can lead to improved agent performance.
DIAMOND achieves a mean human normalized score of 1.46 on the competitive Atari 100k benchmark; a new best for agents trained entirely within a world model.
arXiv Detail & Related papers (2024-05-20T22:51:05Z) - WorldDreamer: Towards General World Models for Video Generation via
Predicting Masked Tokens [75.02160668328425]
We introduce WorldDreamer, a pioneering world model to foster a comprehensive comprehension of general world physics and motions.
WorldDreamer frames world modeling as an unsupervised visual sequence modeling challenge.
Our experiments show that WorldDreamer excels in generating videos across different scenarios, including natural scenes and driving environments.
arXiv Detail & Related papers (2024-01-18T14:01:20Z) - STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z) - Hieros: Hierarchical Imagination on Structured State Space Sequence
World Models [4.922995343278039]
Hieros is a hierarchical policy that learns time abstracted world representations and imagines trajectories at multiple time scales in latent space.
We show that our approach outperforms the state of the art in terms of mean and median normalized human score on the Atari 100k benchmark.
arXiv Detail & Related papers (2023-10-08T13:52:40Z) - Real-World Humanoid Locomotion with Reinforcement Learning [92.85934954371099]
We present a fully learning-based approach for real-world humanoid locomotion.
Our controller can walk over various outdoor terrains, is robust to external disturbances, and can adapt in context.
arXiv Detail & Related papers (2023-03-06T18:59:09Z) - Transformers are Sample Efficient World Models [1.9444242128493845]
We introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer.
With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games.
arXiv Detail & Related papers (2022-09-01T17:03:07Z) - DayDreamer: World Models for Physical Robot Learning [142.11031132529524]
Deep reinforcement learning is a common approach to robot learning but requires a large amount of trial and error to learn.
Many advances in robot learning rely on simulators.
In this paper, we apply Dreamer to 4 robots to learn online and directly in the real world, without simulators.
arXiv Detail & Related papers (2022-06-28T17:44:48Z) - Human-Level Reinforcement Learning through Theory-Based Modeling,
Exploration, and Planning [27.593497502386143]
Theory-Based Reinforcement Learning uses human-like intuitive theories to explore and model an environment.
We instantiate the approach in a video game playing agent called EMPA.
EMPA matches human learning efficiency on a suite of 90 Atari-style video games.
arXiv Detail & Related papers (2021-07-27T01:38:13Z) - Model-Based Reinforcement Learning for Atari [89.3039240303797]
We show how video prediction models can enable agents to solve Atari games with fewer interactions than model-free methods.
Our experiments evaluate SimPLe on a range of Atari games in low data regime of 100k interactions between the agent and the environment.
arXiv Detail & Related papers (2019-03-01T15:40:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.