World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation
- URL: http://arxiv.org/abs/2509.19080v1
- Date: Tue, 23 Sep 2025 14:38:15 GMT
- Title: World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation
- Authors: Zhennan Jiang, Kai Liu, Yuxin Qin, Shuai Tian, Yupeng Zheng, Mingcai Zhou, Chao Yu, Haoran Li, Dongbin Zhao,
- Abstract summary: We propose World4RL, a framework that employs diffusion-based world models as high-fidelity simulators to refine pre-trained policies for robotic manipulation.<n>World4RL provides high-fidelity environment modeling and enables consistent policy refinement, yielding significantly higher success rates compared to imitation learning.
- Score: 23.270985761700203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robotic manipulation policies are commonly initialized through imitation learning, but their performance is limited by the scarcity and narrow coverage of expert data. Reinforcement learning can refine polices to alleviate this limitation, yet real-robot training is costly and unsafe, while training in simulators suffers from the sim-to-real gap. Recent advances in generative models have demonstrated remarkable capabilities in real-world simulation, with diffusion models in particular excelling at generation. This raises the question of how diffusion model-based world models can be combined to enhance pre-trained policies in robotic manipulation. In this work, we propose World4RL, a framework that employs diffusion-based world models as high-fidelity simulators to refine pre-trained policies entirely in imagined environments for robotic manipulation. Unlike prior works that primarily employ world models for planning, our framework enables direct end-to-end policy optimization. World4RL is designed around two principles: pre-training a diffusion world model that captures diverse dynamics on multi-task datasets and refining policies entirely within a frozen world model to avoid online real-world interactions. We further design a two-hot action encoding scheme tailored for robotic manipulation and adopt diffusion backbones to improve modeling fidelity. Extensive simulation and real-world experiments demonstrate that World4RL provides high-fidelity environment modeling and enables consistent policy refinement, yielding significantly higher success rates compared to imitation learning and other baselines. More visualization results are available at https://world4rl.github.io/.
Related papers
- Coupled Local and Global World Models for Efficient First Order RL [10.305209288475817]
This paper introduces a method that bypasses simulators entirely, training RL policies inside world models learned from robots' interactions with real environments.<n>At its core, our approach enables policy training with large-scale diffusion models via a novel decoupled first-order gradient (FoG) method.<n>We demonstrate the efficacy of our method on the Push-T manipulation task, where it significantly outperforms PPO in sample efficiency.
arXiv Detail & Related papers (2026-02-05T21:57:41Z) - World-Gymnast: Training Robots with Reinforcement Learning in a World Model [4.491505634160759]
We propose World-Gymnast, which performs RL finetuning of a vision-language-action policy by rolling out the policy in an action-conditioned video world model.<n>On the Bridge robot setup, World-Gymnast outperforms SFT by as much as 18x and outperforms software simulator by as much as 2x.<n>Our results suggest learning a world model and training robot policies in the cloud could be the key to bridging the gap between robots that work in demonstrations and robots that can work in anyone's household.
arXiv Detail & Related papers (2026-02-02T18:44:45Z) - GWM: Towards Scalable Gaussian World Models for Robotic Manipulation [53.51622803589185]
We propose a novel branch of world model named Gaussian World Model (GWM) for robotic manipulation.<n>At its core is a latent Diffusion Transformer (DiT) combined with a 3D variational autoencoder, enabling fine-grained scene-level future state reconstruction.<n>Both simulated and real-world experiments depict that GWM can precisely predict future scenes conditioned on diverse robot actions.
arXiv Detail & Related papers (2025-08-25T02:01:09Z) - Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator [50.191655141020505]
Reinforcement Learning (RL) has demonstrated impressive capabilities in robotic control but remains challenging due to high sample complexity, safety concerns, and the sim-to-real gap.<n>We introduce Offline Robotic World Model (RWM-O), a model-based approach that explicitly estimates uncertainty to improve policy learning without reliance on a physics simulator.
arXiv Detail & Related papers (2025-04-23T12:58:15Z) - AdaWorld: Learning Adaptable World Models with Latent Actions [76.50869178593733]
We propose AdaWorld, an innovative world model learning approach that enables efficient adaptation.<n>Key idea is to incorporate action information during the pretraining of world models.<n>We then develop an autoregressive world model that conditions on these latent actions.
arXiv Detail & Related papers (2025-03-24T17:58:15Z) - Accelerating Model-Based Reinforcement Learning with State-Space World Models [18.71404724458449]
Reinforcement learning (RL) is a powerful approach for robot learning.<n>However, model-free RL (MFRL) requires a large number of environment interactions to learn successful control policies.<n>We propose a new method for accelerating model-based RL using state-space world models.
arXiv Detail & Related papers (2025-02-27T15:05:25Z) - Pre-Trained Video Generative Models as World Simulators [59.546627730477454]
We propose Dynamic World Simulation (DWS) to transform pre-trained video generative models into controllable world simulators.<n>To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module.<n> Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models.
arXiv Detail & Related papers (2025-02-10T14:49:09Z) - Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [50.191655141020505]
This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer.<n>By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.
arXiv Detail & Related papers (2025-01-17T10:39:09Z) - Learning from Random Demonstrations: Offline Reinforcement Learning with Importance-Sampled Diffusion Models [19.05224410249602]
We propose a novel approach for offline reinforcement learning with closed-loop policy evaluation and world-model adaptation.
We analyzed the performance of the proposed method and provided an upper bound on the return gap between our method and the real environment under an optimal policy.
arXiv Detail & Related papers (2024-05-30T09:34:31Z) - World Models via Policy-Guided Trajectory Diffusion [21.89154719069519]
Existing world models are autoregressive in that they interleave predicting the next state with sampling the next action from the policy.
We propose a novel world modelling approach that is not autoregressive and generates entire on-policy trajectories in a single pass through a diffusion model.
arXiv Detail & Related papers (2023-12-13T21:46:09Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.<n>Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.<n>Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.