Sim-and-Human Co-training for Data-Efficient and Generalizable Robotic Manipulation
- URL: http://arxiv.org/abs/2601.19406v1
- Date: Tue, 27 Jan 2026 09:41:28 GMT
- Title: Sim-and-Human Co-training for Data-Efficient and Generalizable Robotic Manipulation
- Authors: Kaipeng Fang, Weiqing Liang, Yuyang Li, Ji Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Heng Tao Shen,
- Abstract summary: SimHum is a framework to simultaneously extract kinematic prior from simulated robot actions and visual prior from real-world human observations.<n>Based on the two complementary priors, we achieve data-efficient and generalizable robotic manipulation in real-world tasks.
- Score: 113.13282853889818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthetic simulation data and real-world human data provide scalable alternatives to circumvent the prohibitive costs of robot data collection. However, these sources suffer from the sim-to-real visual gap and the human-to-robot embodiment gap, respectively, which limits the policy's generalization to real-world scenarios. In this work, we identify a natural yet underexplored complementarity between these sources: simulation offers the robot action that human data lacks, while human data provides the real-world observation that simulation struggles to render. Motivated by this insight, we present SimHum, a co-training framework to simultaneously extract kinematic prior from simulated robot actions and visual prior from real-world human observations. Based on the two complementary priors, we achieve data-efficient and generalizable robotic manipulation in real-world tasks. Empirically, SimHum outperforms the baseline by up to $\mathbf{40\%}$ under the same data collection budget, and achieves a $\mathbf{62.5\%}$ OOD success with only 80 real data, outperforming the real only baseline by $7.1\times$. Videos and additional information can be found at \href{https://kaipengfang.github.io/sim-and-human}{project website}.
Related papers
- MiVLA: Towards Generalizable Vision-Language-Action Model with Human-Robot Mutual Imitation Pre-training [102.850162490626]
We propose MiVLA, a vision-language-action model empowered by human-robot mutual imitation pre-training.<n>We show that MiVLA achieves strong improved generalization capability, outperforming state-of-the-art VLAs.
arXiv Detail & Related papers (2025-12-17T12:59:41Z) - Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation [40.96453435496208]
We present a recipe for utilizing simulation data to solve vision-based robotic manipulation tasks.<n>Using two domains--a robot arm and a humanoid--we demonstrate that simulation data can enhance real-world task performance by an average of 38%.
arXiv Detail & Related papers (2025-03-31T17:39:38Z) - Robot Learning with Super-Linear Scaling [13.053949644385932]
CASHER is a pipeline for scaling up data collection and learning in simulation where the performance scales superlinearly with human effort.<n>We show that CASHER enables fine-tuning of pre-trained policies to a target scenario using a video scan without any additional human effort.
arXiv Detail & Related papers (2024-12-02T18:12:02Z) - Learning Human Action Recognition Representations Without Real Humans [66.61527869763819]
We present a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model.
We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks.
Our approach outperforms previous baselines by up to 5%.
arXiv Detail & Related papers (2023-11-10T18:38:14Z) - SynH2R: Synthesizing Hand-Object Motions for Learning Human-to-Robot Handovers [35.386426373890615]
Vision-based human-to-robot handover is an important and challenging task in human-robot interaction.<n>We introduce a framework that can generate plausible human grasping motions suitable for training the robot.<n>This allows us to generate synthetic training and testing data with 100x more objects than previous work.
arXiv Detail & Related papers (2023-11-09T18:57:02Z) - PeopleSansPeople: A Synthetic Data Generator for Human-Centric Computer
Vision [3.5694949627557846]
We release a human-centric synthetic data generator PeopleSansPeople.
It contains simulation-ready 3D human assets, a parameterized lighting and camera system, and generates 2D and 3D bounding box, instance and semantic segmentation, and COCO pose labels.
arXiv Detail & Related papers (2021-12-17T02:33:31Z) - Robot Learning from Randomized Simulations: A Review [59.992761565399185]
Deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data.
State-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive.
We focus on a technique named 'domain randomization' which is a method for learning from randomized simulations.
arXiv Detail & Related papers (2021-11-01T13:55:41Z) - Reactive Long Horizon Task Execution via Visual Skill and Precondition
Models [59.76233967614774]
We describe an approach for sim-to-real training that can accomplish unseen robotic tasks using models learned in simulation to ground components of a simple task planner.
We show an increase in success rate from 91.6% to 98% in simulation and from 10% to 80% success rate in the real-world as compared with naive baselines.
arXiv Detail & Related papers (2020-11-17T15:24:01Z) - Point Cloud Based Reinforcement Learning for Sim-to-Real and Partial
Observability in Visual Navigation [62.22058066456076]
Reinforcement Learning (RL) represents powerful tools to solve complex robotic tasks.
RL does not work directly in the real-world, which is known as the sim-to-real transfer problem.
We propose a method that learns on an observation space constructed by point clouds and environment randomization.
arXiv Detail & Related papers (2020-07-27T17:46:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.