ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis
- URL: http://arxiv.org/abs/2503.14526v1
- Date: Sat, 15 Mar 2025 16:47:25 GMT
- Title: ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis
- Authors: Yu Fang, Yue Yang, Xinghao Zhu, Kaiyuan Zheng, Gedas Bertasius, Daniel Szafir, Mingyu Ding,
- Abstract summary: ReBot is a novel real-to-sim-to-real approach for scaling real robot datasets.<n>We show that ReBot significantly enhances the performance and robustness of vision-language-action (VLA) models.
- Score: 39.50916343607966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-language-action (VLA) models present a promising paradigm by training policies directly on real robot datasets like Open X-Embodiment. However, the high cost of real-world data collection hinders further data scaling, thereby restricting the generalizability of VLAs. In this paper, we introduce ReBot, a novel real-to-sim-to-real approach for scaling real robot datasets and adapting VLA models to target domains, which is the last-mile deployment challenge in robot manipulation. Specifically, ReBot replays real-world robot trajectories in simulation to diversify manipulated objects (real-to-sim), and integrates the simulated movements with inpainted real-world background to synthesize physically realistic and temporally consistent robot videos (sim-to-real). Our approach has several advantages: 1) it enjoys the benefit of real data to minimize the sim-to-real gap; 2) it leverages the scalability of simulation; and 3) it can generalize a pretrained VLA to a target domain with fully automated data pipelines. Extensive experiments in both simulation and real-world environments show that ReBot significantly enhances the performance and robustness of VLAs. For example, in SimplerEnv with the WidowX robot, ReBot improved the in-domain performance of Octo by 7.2% and OpenVLA by 21.8%, and out-of-domain generalization by 19.9% and 9.4%, respectively. For real-world evaluation with a Franka robot, ReBot increased the success rates of Octo by 17% and OpenVLA by 20%. More information can be found at: https://yuffish.github.io/rebot/
Related papers
- VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation [53.63540587160549]
VidBot is a framework enabling zero-shot robotic manipulation using learned 3D affordance from in-the-wild monocular RGB-only human videos.<n> VidBot paves the way for leveraging everyday human videos to make robot learning more scalable.
arXiv Detail & Related papers (2025-03-10T10:04:58Z) - HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation [54.03004125910057]
We show that hierarchical vision-language-action models can be more effective in utilizing off-domain data than standard monolithic VLA models.<n>We show that, with the hierarchical design, the high-level VLM can transfer across significant domain gaps between the off-domain finetuning data and real-robot testing scenarios.
arXiv Detail & Related papers (2025-02-08T07:50:22Z) - Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression [23.99292102237088]
We propose Heterogeneous Masked Autoregression (HMA) for modeling action-video dynamics.
After post-training, this model can be used as a video simulator for evaluating policies and generating synthetic data.
arXiv Detail & Related papers (2025-02-06T18:38:26Z) - RoboGSim: A Real2Sim2Real Robotic Gaussian Splatting Simulator [27.04267700576422]
RoboGSim is a real2sim2real robotic simulator, powered by 3D Gaussian Splatting and the physics engine.
It can synthesize the simulated data with novel views, objects, trajectories, and scenes.
The real2sim and sim2real transfer experiments show a high consistency in the texture and physics.
arXiv Detail & Related papers (2024-11-18T18:58:03Z) - GRUtopia: Dream General Robots in a City at Scale [65.08318324604116]
This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots.
GRScenes includes 100k interactive, finely annotated scenes, which can be freely combined into city-scale environments.
GRResidents is a Large Language Model (LLM) driven Non-Player Character (NPC) system that is responsible for social interaction.
arXiv Detail & Related papers (2024-07-15T17:40:46Z) - IRASim: Learning Interactive Real-Robot Action Simulators [24.591694756757278]
We introduce a novel method, IRASim, to generate realistic videos of a robot arm that executes a given action trajectory.
To validate the effectiveness of our method, we create a new benchmark, IRASim Benchmark, based on three real-robot datasets.
Results show that IRASim outperforms all the baseline methods and is more preferable in human evaluations.
arXiv Detail & Related papers (2024-06-20T17:50:16Z) - Real-time Holistic Robot Pose Estimation with Unknown States [30.41806081818826]
Estimating robot pose from RGB images is a crucial problem in computer vision and robotics.
Previous methods presume full knowledge of robot internal states, e.g. ground-truth robot joint angles.
This work introduces an efficient framework for real-time robot pose estimation from RGB images without requiring known robot states.
arXiv Detail & Related papers (2024-02-08T13:12:50Z) - OmniLRS: A Photorealistic Simulator for Lunar Robotics [2.6718643310547607]
We explain how we built a Lunar simulator based on Isaac Sim, Nvidia's robotic simulator.
This simulation provides fast procedural environment generation, multi-robot capabilities, along with synthetic data pipeline for machine-learning applications.
arXiv Detail & Related papers (2023-09-16T13:48:47Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.