Related papers: ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models

ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models

URL: http://arxiv.org/abs/2601.12428v1
Date: Sun, 18 Jan 2026 14:27:10 GMT
Title: ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models
Authors: Baorui Peng, Wenyao Zhang, Liang Xu, Zekun Qi, Jiazhao Zhang, Hongsi Liu, Wenjun Zeng, Xin Jin,
Abstract summary: ReWorld is a framework aimed to employ reinforcement learning to align the video-based embodied world models with physical realism, task completion capability, embodiment plausibility and visual quality.<n>We show that ReWorld significantly improves the physical fidelity, logical coherence, embodiment and visual quality of generated rollouts, outperforming previous methods.
Score: 27.729654985554372
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, video-based world models that learn to simulate the dynamics have gained increasing attention in robot learning. However, current approaches primarily emphasize visual generative quality while overlooking physical fidelity, dynamic consistency, and task logic, especially for contact-rich manipulation tasks, which limits their applicability to downstream tasks. To this end, we introduce ReWorld, a framework aimed to employ reinforcement learning to align the video-based embodied world models with physical realism, task completion capability, embodiment plausibility and visual quality. Specifically, we first construct a large-scale (~235K) video preference dataset and employ it to train a hierarchical reward model designed to capture multi-dimensional reward consistent with human preferences. We further propose a practical alignment algorithm that post-trains flow-based world models using this reward through a computationally efficient PPO-style algorithm. Comprehensive experiments and theoretical analysis demonstrate that ReWorld significantly improves the physical fidelity, logical coherence, embodiment and visual quality of generated rollouts, outperforming previous methods.

Related papers

Mirage2Matter: A Physically Grounded Gaussian World Model from Video [87.9732484393686]
We present Simulate Anything, a graphics-driven world modeling and simulation framework.<n>Our approach reconstructs real-world environments into a photorealistic scene representation using 3D Gaussian Splatting (3DGS)<n>We then leverage generative models to recover a physically realistic representation and integrate it into a simulation environment via a precision calibration target.
arXiv Detail & Related papers (2026-01-24T07:43:57Z)
Do-Undo: Generating and Reversing Physical Actions in Vision-Language Models [57.71440995598757]
We introduce the Do-Undo task and benchmark to address a critical gap in vision-language models.<n>Do-Undo requires models to simulate the outcome of a physical action and then accurately reverse it, reflecting true cause-and-effect in the visual world.
arXiv Detail & Related papers (2025-12-15T18:03:42Z)
Can World Models Benefit VLMs for World Dynamics? [59.73433292793044]
We investigate the capabilities when world model priors are transferred into Vision-Language Models.<n>We name our best-performing variant Dynamic Vision Aligner (DyVA)<n>We find DyVA to surpass both open-source and proprietary baselines, achieving state-of-the-art or comparable performance.
arXiv Detail & Related papers (2025-10-01T13:07:05Z)
Reinforcement Learning with Inverse Rewards for World Model Post-training [29.19830208692156]
We propose Reinforcement Learning with Inverse Rewards to improve action-following in video world models.<n>RLIR derives verifiable reward signals by recovering input actions from generated videos using an Inverse Dynamics Model.
arXiv Detail & Related papers (2025-09-28T16:27:47Z)
Neural Motion Simulator: Pushing the Limit of World Models in Reinforcement Learning [11.762260966376125]
A motion dynamic model is essential for efficient skill acquisition and effective planning.<n>We introduce the neural motion simulator (MoSim), a world model that predicts the future physical state of an embodied system.<n>MoSim achieves state-of-the-art performance in physical state prediction.
arXiv Detail & Related papers (2025-04-09T17:59:32Z)
Pre-Trained Video Generative Models as World Simulators [59.546627730477454]
We propose Dynamic World Simulation (DWS) to transform pre-trained video generative models into controllable world simulators.<n>To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module.<n> Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models.
arXiv Detail & Related papers (2025-02-10T14:49:09Z)
Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [50.191655141020505]
This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer.<n>By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.
arXiv Detail & Related papers (2025-01-17T10:39:09Z)
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks. We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling. Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z)
Continual Visual Reinforcement Learning with A Life-Long World Model [55.05017177980985]
We present a new continual learning approach for visual dynamics modeling.<n>We first introduce the life-long world model, which learns task-specific latent dynamics.<n>Then, we address the value estimation challenge for previous tasks with the exploratory-conservative behavior learning approach.
arXiv Detail & Related papers (2023-03-12T05:08:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.