RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL
- URL: http://arxiv.org/abs/2512.03556v1
- Date: Wed, 03 Dec 2025 08:24:16 GMT
- Title: RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL
- Authors: Yinzhou Tang, Yu Shang, Yinuo Chen, Bingwen Wei, Xin Zhang, Shu'ang Yu, Liangzhi Shi, Chao Yu, Chen Gao, Wei Wu, Yong Li,
- Abstract summary: Reinforcement Learning (RL) policies struggle to cultivate generalizability across diverse scenarios.<n>RL suffers from the inherent lack of a unified and general reward signal necessary for effective multi-scene generalization.<n>We propose RoboScape-R, a framework leveraging the world model to serve as a versatile, general-purpose proxy for the embodied environment.
- Score: 18.00185999450407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Achieving generalizable embodied policies remains a key challenge. Traditional policy learning paradigms, including both Imitation Learning (IL) and Reinforcement Learning (RL), struggle to cultivate generalizability across diverse scenarios. While IL policies often overfit to specific expert trajectories, RL suffers from the inherent lack of a unified and general reward signal necessary for effective multi-scene generalization. We posit that the world model is uniquely capable of serving as a universal environment proxy to address this limitation. However, current world models primarily focus on their ability to predict observations and still rely on task-specific, handcrafted reward functions, thereby failing to provide a truly general training environment. Toward this problem, we propose RoboScape-R, a framework leveraging the world model to serve as a versatile, general-purpose proxy for the embodied environment within the RL paradigm. We introduce a novel world model-based general reward mechanism that generates ''endogenous'' rewards derived from the model's intrinsic understanding of real-world state transition dynamics. Extensive experiments demonstrate that RoboScape-R effectively addresses the limitations of traditional RL methods by providing an efficient and general training environment that substantially enhances the generalization capability of embodied policies. Our approach offers critical insights into utilizing the world model as an online training strategy and achieves an average 37.5% performance improvement over baselines under out-of-domain scenarios.
Related papers
- Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z) - Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [50.191655141020505]
This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer.<n>By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.
arXiv Detail & Related papers (2025-01-17T10:39:09Z) - Mind the Gap: Towards Generalizable Autonomous Penetration Testing via Domain Randomization and Meta-Reinforcement Learning [15.619925926862235]
GAP is a generalizable autonomous pentesting framework.<n>It aims to realizes efficient policy training in realistic environments.<n>It also trains agents capable of drawing inferences about other cases from one instance.
arXiv Detail & Related papers (2024-12-05T11:24:27Z) - Improving Generalization in Reinforcement Learning Training Regimes for
Social Robot Navigation [5.475804640008192]
We propose a method to improve the generalization performance of RL social navigation methods using curriculum learning.
Our results show that the use of curriculum learning in training can be used to achieve better generalization performance than previous training methods.
arXiv Detail & Related papers (2023-08-29T00:00:18Z) - Representation Learning for Continuous Action Spaces is Beneficial for
Efficient Policy Learning [64.14557731665577]
Deep reinforcement learning (DRL) breaks through the bottlenecks of traditional reinforcement learning (RL)
In this paper, we propose an efficient policy learning method in latent state and action spaces.
The effectiveness of the proposed method is demonstrated by MountainCar,CarRacing and Cheetah experiments.
arXiv Detail & Related papers (2022-11-23T19:09:37Z) - Generalized Real-World Super-Resolution through Adversarial Robustness [107.02188934602802]
We present Robust Super-Resolution, a method that leverages the generalization capability of adversarial attacks to tackle real-world SR.
Our novel framework poses a paradigm shift in the development of real-world SR methods.
By using a single robust model, we outperform state-of-the-art specialized methods on real-world benchmarks.
arXiv Detail & Related papers (2021-08-25T22:43:20Z) - DisCo RL: Distribution-Conditioned Reinforcement Learning for
General-Purpose Policies [116.12670064963625]
We develop an off-policy algorithm called distribution-conditioned reinforcement learning (DisCo RL) to efficiently learn contextual policies.
We evaluate DisCo RL on a variety of robot manipulation tasks and find that it significantly outperforms prior methods on tasks that require generalization to new goal distributions.
arXiv Detail & Related papers (2021-04-23T16:51:58Z) - Reinforcement Learning with a Disentangled Universal Value Function for
Item Recommendation [35.79993074465577]
We develop a model-based reinforcement learning framework with a disentangled universal value function, called GoalRec.
We demonstrate the superiority of GoalRec over previous approaches in terms of the above three practical challenges in a series of simulations and a real application.
arXiv Detail & Related papers (2021-04-07T08:13:32Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.