Hieros: Hierarchical Imagination on Structured State Space Sequence
World Models
- URL: http://arxiv.org/abs/2310.05167v3
- Date: Sun, 18 Feb 2024 13:42:53 GMT
- Title: Hieros: Hierarchical Imagination on Structured State Space Sequence
World Models
- Authors: Paul Mattes, Rainer Schlosser, Ralf Herbrich
- Abstract summary: Hieros is a hierarchical policy that learns time abstracted world representations and imagines trajectories at multiple time scales in latent space.
We show that our approach outperforms the state of the art in terms of mean and median normalized human score on the Atari 100k benchmark.
- Score: 4.922995343278039
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the biggest challenges to modern deep reinforcement learning (DRL)
algorithms is sample efficiency. Many approaches learn a world model in order
to train an agent entirely in imagination, eliminating the need for direct
environment interaction during training. However, these methods often suffer
from either a lack of imagination accuracy, exploration capabilities, or
runtime efficiency. We propose Hieros, a hierarchical policy that learns time
abstracted world representations and imagines trajectories at multiple time
scales in latent space. Hieros uses an S5 layer-based world model, which
predicts next world states in parallel during training and iteratively during
environment interaction. Due to the special properties of S5 layers, our method
can train in parallel and predict next world states iteratively during
imagination. This allows for more efficient training than RNN-based world
models and more efficient imagination than Transformer-based world models.
We show that our approach outperforms the state of the art in terms of mean
and median normalized human score on the Atari 100k benchmark, and that our
proposed world model is able to predict complex dynamics very accurately. We
also show that Hieros displays superior exploration capabilities compared to
existing approaches.
Related papers
- Improving World Models using Deep Supervision with Linear Probes [0.0]
In this paper, we investigate a deep supervision technique for encouraging the development of a world model in a network trained end-to-end to predict the next observation.
Using an experimental environment based on the Flappy Bird game, we explore the effect of adding a linear probe component to the network's loss function.
Our experiments demonstrate that this supervision technique improves both training and test performance, enhances training stability, and results in more easily decodable world features.
arXiv Detail & Related papers (2025-04-04T18:35:21Z) - AdaWorld: Learning Adaptable World Models with Latent Actions [76.50869178593733]
We propose AdaWorld, an innovative world model learning approach that enables efficient adaptation.
Key idea is to incorporate action information during the pretraining of world models.
We then develop an autoregressive world model that conditions on these latent actions.
arXiv Detail & Related papers (2025-03-24T17:58:15Z) - Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning [2.5749046466046903]
In Reinforcement Learning (RL), world models aim to capture how the environment evolves in response to the agent's actions.
We show that performing the dreaming process inside the latent space allows for training with fewer environment steps.
We conclude that the combination of GW with World Models holds great potential for improving decision-making in RL agents.
arXiv Detail & Related papers (2025-02-28T15:24:17Z) - Accelerating Model-Based Reinforcement Learning with State-Space World Models [18.71404724458449]
Reinforcement learning (RL) is a powerful approach for robot learning.
However, model-free RL (MFRL) requires a large number of environment interactions to learn successful control policies.
We propose a new method for accelerating model-based RL using state-space world models.
arXiv Detail & Related papers (2025-02-27T15:05:25Z) - Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [50.191655141020505]
This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer.
By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.
arXiv Detail & Related papers (2025-01-17T10:39:09Z) - Open-World Reinforcement Learning over Long Short-Term Imagination [91.28998327423295]
We present LS-Imagine, which extends the imagination horizon within a limited number of state transition steps.
Our method demonstrates significant improvements over state-of-the-art techniques in MineDojo.
arXiv Detail & Related papers (2024-10-04T17:17:30Z) - Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models [60.87795376541144]
A world model is a neural network capable of predicting an agent's next state given past states and actions.
During end-to-end training, our policy learns how to recover from errors by aligning with states observed in human demonstrations.
We present qualitative and quantitative results, demonstrating significant improvements upon prior state of the art in closed-loop testing.
arXiv Detail & Related papers (2024-09-25T06:48:25Z) - One-shot World Models Using a Transformer Trained on a Synthetic Prior [37.027893127637036]
One-Shot World Model (OSWM) is a transformer world model that is learned in an in-context learning fashion from purely synthetic data.
OSWM is able to quickly adapt to the dynamics of a simple grid world, as well as the CartPole gym and a custom control environment.
arXiv Detail & Related papers (2024-09-21T09:39:32Z) - Learning Interactive Real-World Simulators [96.5991333400566]
We explore the possibility of learning a universal simulator of real-world interaction through generative modeling.
We use the simulator to train both high-level vision-language policies and low-level reinforcement learning policies.
Video captioning models can benefit from training with simulated experience, opening up even wider applications.
arXiv Detail & Related papers (2023-10-09T19:42:22Z) - Self-supervised novel 2D view synthesis of large-scale scenes with
efficient multi-scale voxel carving [77.07589573960436]
We introduce an efficient multi-scale voxel carving method to generate novel views of real scenes.
Our final high-resolution output is efficiently self-trained on data automatically generated by the voxel carving module.
We demonstrate the effectiveness of our method on highly complex and large-scale scenes in real environments.
arXiv Detail & Related papers (2023-06-26T13:57:05Z) - Adaptive action supervision in reinforcement learning from real-world
multi-agent demonstrations [10.174009792409928]
We propose a method for adaptive action supervision in RL from real-world demonstrations in multi-agent scenarios.
In the experiments, using chase-and-escape and football tasks with the different dynamics between the unknown source and target environments, we show that our approach achieved a balance between the generalization and the generalization ability compared with the baselines.
arXiv Detail & Related papers (2023-05-22T13:33:37Z) - DITTO: Offline Imitation Learning with World Models [21.419536711242962]
DITTO is an offline imitation learning algorithm which addresses all three of these problems.
We optimize this multi-step latent divergence using standard reinforcement learning algorithms.
Our results show how creative use of world models can lead to a simple, robust, and highly-performant policy-learning framework.
arXiv Detail & Related papers (2023-02-06T19:41:18Z) - Predictive World Models from Real-World Partial Observations [66.80340484148931]
We present a framework for learning a probabilistic predictive world model for real-world road environments.
While prior methods require complete states as ground truth for learning, we present a novel sequential training method to allow HVAEs to learn to predict complete states from partially observed states only.
arXiv Detail & Related papers (2023-01-12T02:07:26Z) - Transformers are Sample Efficient World Models [1.9444242128493845]
We introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer.
With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games.
arXiv Detail & Related papers (2022-09-01T17:03:07Z) - Cycle-Consistent World Models for Domain Independent Latent Imagination [0.0]
High costs and risks make it hard to train autonomous cars in the real world.
We propose a novel model-based reinforcement learning approach called Cycleconsistent World Models.
arXiv Detail & Related papers (2021-10-02T13:55:50Z) - Mastering Atari with Discrete World Models [61.7688353335468]
We introduce DreamerV2, a reinforcement learning agent that learns behaviors purely from predictions in the compact latent space of a powerful world model.
DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model.
arXiv Detail & Related papers (2020-10-05T17:52:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.