Training Agents Inside of Scalable World Models
- URL: http://arxiv.org/abs/2509.24527v1
- Date: Mon, 29 Sep 2025 09:42:27 GMT
- Title: Training Agents Inside of Scalable World Models
- Authors: Danijar Hafner, Wilson Yan, Timothy Lillicrap,
- Abstract summary: We introduce Dreamer 4, a scalable agent that learns to solve control tasks by reinforcement learning inside of a fast and accurate world model.<n>In the complex video game Minecraft, the world model accurately predicts object interactions and game mechanics, outperforming previous world models by a large margin.<n>By learning behaviors in imagination, Dreamer 4 is the first agent to obtain diamonds in Minecraft purely from offline data, without environment interaction.
- Score: 15.869131616690042
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: World models learn general knowledge from videos and simulate experience for training behaviors in imagination, offering a path towards intelligent agents. However, previous world models have been unable to accurately predict object interactions in complex environments. We introduce Dreamer 4, a scalable agent that learns to solve control tasks by reinforcement learning inside of a fast and accurate world model. In the complex video game Minecraft, the world model accurately predicts object interactions and game mechanics, outperforming previous world models by a large margin. The world model achieves real-time interactive inference on a single GPU through a shortcut forcing objective and an efficient transformer architecture. Moreover, the world model learns general action conditioning from only a small amount of data, allowing it to extract the majority of its knowledge from diverse unlabeled videos. We propose the challenge of obtaining diamonds in Minecraft from only offline data, aligning with practical applications such as robotics where learning from environment interaction can be unsafe and slow. This task requires choosing sequences of over 20,000 mouse and keyboard actions from raw pixels. By learning behaviors in imagination, Dreamer 4 is the first agent to obtain diamonds in Minecraft purely from offline data, without environment interaction. Our work provides a scalable recipe for imagination training, marking a step towards intelligent agents.
Related papers
- DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos [110.98100817695307]
We introduce DreamDojo, a foundation world model that learns diverse interactions and dexterous controls from 44k hours of egocentric human videos.<n>Our work enables several important applications based on generative world models, including live teleoperation, policy evaluation, and model-based planning.
arXiv Detail & Related papers (2026-02-06T18:49:43Z) - Learning Latent Action World Models In The Wild [50.453458324163705]
We study the problem of learning latent actions world models on in-the-wild videos.<n>We find that continuous, but constrained, latent actions are able to capture the complexity of actions from in-the-wild videos.<n>In the absence of a common embodiment across videos, we are mainly able to learn latent actions that become localized in space.
arXiv Detail & Related papers (2026-01-08T18:55:39Z) - MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft [21.530000271719803]
We propose MineWorld, a real-time interactive world model on Minecraft.<n>MineWorld is driven by a visual-action autoregressive Transformer, which takes paired game scenes and corresponding actions as input.<n>We develop a novel parallel decoding algorithm that predicts the spatial redundant tokens in each frame at the same time.
arXiv Detail & Related papers (2025-04-11T09:41:04Z) - Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination [25.62602420895531]
DreMa is a new approach for constructing digital twins using learned explicit representations of the real world and its dynamics.<n>We show that DreMa can successfully learn novel physical tasks from just a single example per task variation.
arXiv Detail & Related papers (2024-12-19T15:38:15Z) - Learning Interactive Real-World Simulators [96.5991333400566]
We explore the possibility of learning a universal simulator of real-world interaction through generative modeling.
We use the simulator to train both high-level vision-language policies and low-level reinforcement learning policies.
Video captioning models can benefit from training with simulated experience, opening up even wider applications.
arXiv Detail & Related papers (2023-10-09T19:42:22Z) - Hieros: Hierarchical Imagination on Structured State Space Sequence
World Models [4.922995343278039]
Hieros is a hierarchical policy that learns time abstracted world representations and imagines trajectories at multiple time scales in latent space.
We show that our approach outperforms the state of the art in terms of mean and median normalized human score on the Atari 100k benchmark.
arXiv Detail & Related papers (2023-10-08T13:52:40Z) - DayDreamer: World Models for Physical Robot Learning [142.11031132529524]
Deep reinforcement learning is a common approach to robot learning but requires a large amount of trial and error to learn.
Many advances in robot learning rely on simulators.
In this paper, we apply Dreamer to 4 robots to learn online and directly in the real world, without simulators.
arXiv Detail & Related papers (2022-06-28T17:44:48Z) - A Differentiable Recipe for Learning Visual Non-Prehensile Planar
Manipulation [63.1610540170754]
We focus on the problem of visual non-prehensile planar manipulation.
We propose a novel architecture that combines video decoding neural models with priors from contact mechanics.
We find that our modular and fully differentiable architecture performs better than learning-only methods on unseen objects and motions.
arXiv Detail & Related papers (2021-11-09T18:39:45Z) - Mastering Atari with Discrete World Models [61.7688353335468]
We introduce DreamerV2, a reinforcement learning agent that learns behaviors purely from predictions in the compact latent space of a powerful world model.
DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model.
arXiv Detail & Related papers (2020-10-05T17:52:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.