Related papers: One-shot World Models Using a Transformer Trained on a Synthetic Prior

One-shot World Models Using a Transformer Trained on a Synthetic Prior

URL: http://arxiv.org/abs/2409.14084v2
Date: Thu, 24 Oct 2024 18:57:44 GMT
Title: One-shot World Models Using a Transformer Trained on a Synthetic Prior
Authors: Fabio Ferreira, Moreno Schlageter, Raghu Rajan, Andre Biedenkapp, Frank Hutter,
Abstract summary: One-Shot World Model (OSWM) is a transformer world model that is learned in an in-context learning fashion from purely synthetic data. OSWM is able to quickly adapt to the dynamics of a simple grid world, as well as the CartPole gym and a custom control environment.
Score: 37.027893127637036
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A World Model is a compressed spatial and temporal representation of a real world environment that allows one to train an agent or execute planning methods. However, world models are typically trained on observations from the real world environment, and they usually do not enable learning policies for other real environments. We propose One-Shot World Model (OSWM), a transformer world model that is learned in an in-context learning fashion from purely synthetic data sampled from a prior distribution. Our prior is composed of multiple randomly initialized neural networks, where each network models the dynamics of each state and reward dimension of a desired target environment. We adopt the supervised learning procedure of Prior-Fitted Networks by masking next-state and reward at random context positions and query OSWM to make probabilistic predictions based on the remaining transition context. During inference time, OSWM is able to quickly adapt to the dynamics of a simple grid world, as well as the CartPole gym and a custom control environment by providing 1k transition steps as context and is then able to successfully train environment-solving agent policies. However, transferring to more complex environments remains a challenge, currently. Despite these limitations, we see this work as an important stepping-stone in the pursuit of learning world models purely from synthetic data.

Related papers

Curiosity-Driven Imagination: Discovering Plan Operators and Learning Associated Policies for Open-World Adaptation [7.406934849952094]
Adapting quickly to dynamic, uncertain environments is a major challenge in robotics. Traditional Task and Motion Planning approaches struggle to cope with unforeseen changes, are data-inefficient when adapting, and do not leverage world models during learning. We address this issue with a hybrid planning and learning system that integrates two models: a low level neural network based model that learns transitions and drives exploration via an Intrinsic Curiosity Module (ICM) Our evaluation in a robotic manipulation domain with sequential novelty injections demonstrates that our approach converges faster and outperforms state-of-the-art hybrid methods.
arXiv Detail & Related papers (2025-03-06T20:02:26Z)
Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning [2.5749046466046903]
In Reinforcement Learning (RL), world models aim to capture how the environment evolves in response to the agent's actions. We show that performing the dreaming process inside the latent space allows for training with fewer environment steps. We conclude that the combination of GW with World Models holds great potential for improving decision-making in RL agents.
arXiv Detail & Related papers (2025-02-28T15:24:17Z)
Trajectory World Models for Heterogeneous Environments [67.27233466954814]
Heterogeneity in sensors and actuators across environments poses a significant challenge to building large-scale pre-trained world models. We introduce UniTraj, a unified dataset comprising over one million trajectories from 80 environments, designed to scale data while preserving critical diversity. We propose TrajWorld, a novel architecture capable of flexibly handling varying sensor and actuator information and capturing environment dynamics in-context.
arXiv Detail & Related papers (2025-02-03T13:59:08Z)
Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [50.191655141020505]
This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer. By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.
arXiv Detail & Related papers (2025-01-17T10:39:09Z)
Generalizable Implicit Neural Representation As a Universal Spatiotemporal Traffic Data Learner [46.866240648471894]
Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system. We present a novel paradigm to address the STTD learning problem by parameterizing STTD as an implicit neural representation. We validate its effectiveness through extensive experiments in real-world scenarios, showcasing applications from corridor to network scales.
arXiv Detail & Related papers (2024-06-13T02:03:22Z)
Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function. We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z)
Gradient-based Planning with World Models [21.9392160209565]
We present an exploration of a gradient-based alternative that fully leverages the differentiability of the world model. In a sample-efficient setting, our method achieves on par or superior performance compared to the alternative approaches in most tasks.
arXiv Detail & Related papers (2023-12-28T18:54:21Z)
Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models. Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning. Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z)
Predictive Experience Replay for Continual Visual Control and Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting. We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting. Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z)
Predictive World Models from Real-World Partial Observations [66.80340484148931]
We present a framework for learning a probabilistic predictive world model for real-world road environments. While prior methods require complete states as ground truth for learning, we present a novel sequential training method to allow HVAEs to learn to predict complete states from partially observed states only.
arXiv Detail & Related papers (2023-01-12T02:07:26Z)
Quantifying Multimodality in World Models [5.593667856320704]
We propose new metrics for the detection and quantification of multimodal uncertainty in RL based World Models. The correct modelling & detection of uncertain future states lays the foundation for handling critical situations in a safe way.
arXiv Detail & Related papers (2021-12-14T09:52:18Z)
Dream to Explore: Adaptive Simulations for Autonomous Systems [3.0664963196464448]
We tackle the problem of learning to control dynamical systems by applying Bayesian nonparametric methods. By employing Gaussian processes to discover latent world dynamics, we mitigate common data efficiency issues observed in reinforcement learning. Our algorithm jointly learns a world model and policy by optimizing a variational lower bound of a log-likelihood.
arXiv Detail & Related papers (2021-10-27T04:27:28Z)
Learning to Continuously Optimize Wireless Resource In Episodically Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment. We propose to build the notion of continual learning into the modeling process of learning wireless systems. Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.