Coupled Local and Global World Models for Efficient First Order RL
- URL: http://arxiv.org/abs/2602.06219v1
- Date: Thu, 05 Feb 2026 21:57:41 GMT
- Title: Coupled Local and Global World Models for Efficient First Order RL
- Authors: Joseph Amigo, Rooholla Khorrambakht, Nicolas Mansard, Ludovic Righetti,
- Abstract summary: This paper introduces a method that bypasses simulators entirely, training RL policies inside world models learned from robots' interactions with real environments.<n>At its core, our approach enables policy training with large-scale diffusion models via a novel decoupled first-order gradient (FoG) method.<n>We demonstrate the efficacy of our method on the Push-T manipulation task, where it significantly outperforms PPO in sample efficiency.
- Score: 10.305209288475817
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: World models offer a promising avenue for more faithfully capturing complex dynamics, including contacts and non-rigidity, as well as complex sensory information, such as visual perception, in situations where standard simulators struggle. However, these models are computationally complex to evaluate, posing a challenge for popular RL approaches that have been successfully used with simulators to solve complex locomotion tasks but yet struggle with manipulation. This paper introduces a method that bypasses simulators entirely, training RL policies inside world models learned from robots' interactions with real environments. At its core, our approach enables policy training with large-scale diffusion models via a novel decoupled first-order gradient (FoG) method: a full-scale world model generates accurate forward trajectories, while a lightweight latent-space surrogate approximates its local dynamics for efficient gradient computation. This coupling of a local and global world model ensures high-fidelity unrolling alongside computationally tractable differentiation. We demonstrate the efficacy of our method on the Push-T manipulation task, where it significantly outperforms PPO in sample efficiency. We further evaluate our approach through an ego-centric object manipulation task with a quadruped. Together, these results demonstrate that learning inside data-driven world models is a promising pathway for solving hard-to-model RL tasks in image space without reliance on hand-crafted physics simulators.
Related papers
- Disentangling perception and reasoning for improving data efficiency in learning cloth manipulation without demonstrations [2.2800981616160843]
Cloth manipulation is a ubiquitous task in everyday life, but it remains an open challenge for robotics.<n>The difficulties in developing cloth manipulation policies are attributed to the high-dimensional state space, complex dynamics, and high propensity to self-occlusion exhibited by fabrics.<n>We show that, through careful design choices, model size and training time can be significantly reduced when learning in simulation. Furthermore, we demonstrate how the resulting simulation-trained model can be transferred to the real world.
arXiv Detail & Related papers (2026-01-29T13:41:35Z) - World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation [23.270985761700203]
We propose World4RL, a framework that employs diffusion-based world models as high-fidelity simulators to refine pre-trained policies for robotic manipulation.<n>World4RL provides high-fidelity environment modeling and enables consistent policy refinement, yielding significantly higher success rates compared to imitation learning.
arXiv Detail & Related papers (2025-09-23T14:38:15Z) - Accelerating Model-Based Reinforcement Learning with State-Space World Models [18.71404724458449]
Reinforcement learning (RL) is a powerful approach for robot learning.<n>However, model-free RL (MFRL) requires a large number of environment interactions to learn successful control policies.<n>We propose a new method for accelerating model-based RL using state-space world models.
arXiv Detail & Related papers (2025-02-27T15:05:25Z) - Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [50.191655141020505]
This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer.<n>By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.
arXiv Detail & Related papers (2025-01-17T10:39:09Z) - STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z) - HarmonyDream: Task Harmonization Inside World Models [93.07314830304193]
Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning.
We propose a simple yet effective approach, HarmonyDream, which automatically adjusts loss coefficients to maintain task harmonization.
arXiv Detail & Related papers (2023-09-30T11:38:13Z) - Sim-to-Real Deep Reinforcement Learning with Manipulators for
Pick-and-place [1.7478203318226313]
When transferring a Deep Reinforcement Learning model from simulation to the real world, the performance could be unsatisfactory.
This paper proposes a self-supervised vision-based DRL method that allows robots to pick and place objects effectively.
arXiv Detail & Related papers (2023-09-17T11:51:18Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Continual Visual Reinforcement Learning with A Life-Long World Model [55.05017177980985]
We present a new continual learning approach for visual dynamics modeling.<n>We first introduce the life-long world model, which learns task-specific latent dynamics.<n>Then, we address the value estimation challenge for previous tasks with the exploratory-conservative behavior learning approach.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Bridging the Gap to Real-World Object-Centric Learning [66.55867830853803]
We show that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way.
Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data.
arXiv Detail & Related papers (2022-09-29T15:24:47Z) - Cycle-Consistent World Models for Domain Independent Latent Imagination [0.0]
High costs and risks make it hard to train autonomous cars in the real world.
We propose a novel model-based reinforcement learning approach called Cycleconsistent World Models.
arXiv Detail & Related papers (2021-10-02T13:55:50Z) - Bridging Imagination and Reality for Model-Based Deep Reinforcement
Learning [72.18725551199842]
We propose a novel model-based reinforcement learning algorithm, called BrIdging Reality and Dream (BIRD)
It maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectories.
We demonstrate that our approach improves sample efficiency of model-based planning, and achieves state-of-the-art performance on challenging visual control benchmarks.
arXiv Detail & Related papers (2020-10-23T03:22:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.