RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning
- URL: http://arxiv.org/abs/2602.18742v1
- Date: Sat, 21 Feb 2026 07:33:24 GMT
- Title: RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning
- Authors: Seungku Kim, Suhyeok Jang, Byungjun Yoon, Dongyoung Kim, John Won, Jinwoo Shin,
- Abstract summary: We introduce RoboCurate, a novel synthetic robot data generation framework that evaluates and filters the quality of annotated actions.<n>Specifically, RoboCurate replays the predicted actions in a simulator and assesses action quality by measuring the consistency of motion.<n>We observe RoboCurate's generated data yield substantial relative improvements in success rates compared to using real data only.
- Score: 47.25770917635344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthetic data generated by video generative models has shown promise for robot learning as a scalable pipeline, but it often suffers from inconsistent action quality due to imperfectly generated videos. Recently, vision-language models (VLMs) have been leveraged to validate video quality, but they have limitations in distinguishing physically accurate videos and, even then, cannot directly evaluate the generated actions themselves. To tackle this issue, we introduce RoboCurate, a novel synthetic robot data generation framework that evaluates and filters the quality of annotated actions by comparing them with simulation replay. Specifically, RoboCurate replays the predicted actions in a simulator and assesses action quality by measuring the consistency of motion between the simulator rollout and the generated video. In addition, we unlock observation diversity beyond the available dataset via image-to-image editing and apply action-preserving video-to-video transfer to further augment appearance. We observe RoboCurate's generated data yield substantial relative improvements in success rates compared to using real data only, achieving +70.1% on GR-1 Tabletop (300 demos), +16.1% on DexMimicGen in the pre-training setup, and +179.9% in the challenging real-world ALLEX humanoid dexterous manipulation setting.
Related papers
- AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis [33.90053396451562]
We introduce AnchorDream, an embodiment-aware world model that repurposes pretrained video diffusion models for robot data synthesis.<n>Our method scales them into large, diverse, high-quality datasets without requiring explicit environment modeling.<n>Experiments show that the generated data leads to consistent improvements in downstream policy learning, with relative gains of 36.4% in simulator benchmarks and nearly double performance in real-world studies.
arXiv Detail & Related papers (2025-12-12T18:59:45Z) - GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning [79.68241687396603]
We propose GenFlowRL, which derives shaped rewards from generated flow trained from diverse cross-embodiment datasets.<n>Experiments on 10 manipulation tasks, both in simulation and real-world cross-embodiment evaluations, demonstrate that GenFlowRL effectively leverages manipulation features extracted from generated object-centric flow.
arXiv Detail & Related papers (2025-08-14T20:19:20Z) - Physical Autoregressive Model for Robotic Manipulation without Action Pretraining [65.8971623698511]
We build upon autoregressive video generation models to propose a Physical Autoregressive Model (PAR)<n>PAR leverages the world knowledge embedded in video pretraining to understand physical dynamics without requiring action pretraining.<n>Experiments on the ManiSkill benchmark show that PAR achieves a 100% success rate on the PushCube task.
arXiv Detail & Related papers (2025-08-13T13:54:51Z) - Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations [19.28925489415787]
RIGVid enables robots to perform complex manipulation tasks by imitating AI-generated videos.<n>A video diffusion model generates potential demonstration videos, and a vision-language model automatically filters out results that do not follow the command.<n>A 6D pose tracker then extracts object trajectories from the video, and the trajectories are retargeted to the robot in an embodiment-agnostic fashion.
arXiv Detail & Related papers (2025-07-01T17:39:59Z) - RoboPearls: Editable Video Simulation for Robot Manipulation [81.18434338506621]
RoboPearls is an editable video simulation framework for robotic manipulation.<n>Built on 3D Gaussian Splatting (3DGS), RoboPearls enables the construction of photo-realistic, view-consistent simulations.<n>We conduct extensive experiments on multiple datasets and scenes, including RLBench, COLOSSEUM, Ego4D, Open X-Embodiment, and a real-world robot.
arXiv Detail & Related papers (2025-06-28T05:03:31Z) - DreamGen: Unlocking Generalization in Robot Learning through Video World Models [120.25799361925387]
DreamGen is a pipeline for training robot policies that generalize across behaviors and environments through neural trajectories.<n>Our work establishes a promising new axis for scaling robot learning well beyond manual data collection.
arXiv Detail & Related papers (2025-05-19T04:55:39Z) - Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression [23.99292102237088]
We propose Heterogeneous Masked Autoregression (HMA) for modeling action-video dynamics.<n>After post-training, this model can be used as a video simulator for evaluating policies and generating synthetic data.
arXiv Detail & Related papers (2025-02-06T18:38:26Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - From Play to Policy: Conditional Behavior Generation from Uncurated
Robot Data [18.041329181385414]
Conditional Behavior Transformers (C-BeT) is a method that combines the multi-modal generation ability of Behavior Transformer with future-conditioned goal specification.
C-BeT improves upon prior state-of-the-art work in learning from play data by an average of 45.7%.
We demonstrate for the first time that useful task-centric behaviors can be learned on a real-world robot purely from play data.
arXiv Detail & Related papers (2022-10-18T17:59:55Z) - CaRTS: Causality-driven Robot Tool Segmentation from Vision and
Kinematics Data [11.92904350972493]
Vision-based segmentation of the robotic tool during robot-assisted surgery enables downstream applications, such as augmented reality feedback.
With the introduction of deep learning, many methods were presented to solve instrument segmentation directly and solely from images.
We present CaRTS, a causality-driven robot tool segmentation algorithm, that is designed based on a complementary causal model of the robot tool segmentation task.
arXiv Detail & Related papers (2022-03-15T22:26:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.