Humanoid Manipulation Interface: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations
- URL: http://arxiv.org/abs/2602.06643v2
- Date: Thu, 12 Feb 2026 15:32:00 GMT
- Title: Humanoid Manipulation Interface: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations
- Authors: Ruiqian Nai, Boyuan Zheng, Junming Zhao, Haodong Zhu, Sicong Dai, Zunhao Chen, Yihang Hu, Yingdong Hu, Tong Zhang, Chuan Wen, Yang Gao,
- Abstract summary: We present the Humanoid Manipulation Interface (HuMI), a portable and efficient framework for learning diverse whole-body manipulation tasks.<n>HuMI enables robot-free data collection by capturing rich whole-body motion using portable hardware.<n>HuMI achieves a 3x increase in data collection efficiency compared to teleoperation and attains a 70% success rate in unseen environments.
- Score: 25.15848825594207
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current approaches for humanoid whole-body manipulation, primarily relying on teleoperation or visual sim-to-real reinforcement learning, are hindered by hardware logistics and complex reward engineering. Consequently, demonstrated autonomous skills remain limited and are typically restricted to controlled environments. In this paper, we present the Humanoid Manipulation Interface (HuMI), a portable and efficient framework for learning diverse whole-body manipulation tasks across various environments. HuMI enables robot-free data collection by capturing rich whole-body motion using portable hardware. This data drives a hierarchical learning pipeline that translates human motions into dexterous and feasible humanoid skills. Extensive experiments across five whole-body tasks--including kneeling, squatting, tossing, walking, and bimanual manipulation--demonstrate that HuMI achieves a 3x increase in data collection efficiency compared to teleoperation and attains a 70% success rate in unseen environments.
Related papers
- HumanX: Toward Agile and Generalizable Humanoid Interaction Skills from Human Videos [74.43500240121476]
We present HumanX, a full-stack framework that compiles human video into generalizable, real-world interaction skills for humanoids.<n>HumanX achieves over 8 times higher generalization success than prior methods.
arXiv Detail & Related papers (2026-02-02T18:53:01Z) - METIS: Multi-Source Egocentric Training for Integrated Dexterous Vision-Language-Action Model [36.82365894983052]
A major bottleneck lies in the scarcity of large-scale, action-annotated data for dexterous skills.<n>We propose METIS, a vision-language-action model for dexterous manipulation pretrained on egocentric datasets.<n>Our method demonstrates exceptional dexterous manipulation capabilities, achieving highest average success rate in six real-world tasks.
arXiv Detail & Related papers (2025-11-21T16:32:36Z) - Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations [52.29884993824894]
Learning multi-fingered robot policies from humans performing daily tasks in natural environments has long been a grand goal in the robotics community.<n>AINA enables learning multi-fingered policies from data collected by anyone, anywhere, and in any environment using Aria Gen 2 glasses.
arXiv Detail & Related papers (2025-11-20T18:59:02Z) - DemoHLM: From One Demonstration to Generalizable Humanoid Loco-Manipulation [29.519071338337685]
We present DemoHLM, a framework for humanoid loco-manipulation on a real humanoid robot from a single demonstration in simulation.<n>whole-body controller maps whole-body motion commands to joint torques and provides omnidirectional mobility for the humanoid robot.<n> Experiments show a positive correlation between the amount of synthetic data and policy performance.
arXiv Detail & Related papers (2025-10-13T10:49:40Z) - Learning from Massive Human Videos for Universal Humanoid Pose Control [46.417054298537195]
This paper introduces Humanoid-X, a large-scale dataset of over 20 million humanoid robot poses with corresponding text-based motion descriptions.<n>We train a large humanoid model, UH-1, which takes text instructions as input and outputs corresponding actions to control a humanoid robot.<n>Our scalable training approach leads to superior generalization in text-based humanoid control, marking a significant step toward adaptable, real-world-ready humanoid robots.
arXiv Detail & Related papers (2024-12-18T18:59:56Z) - Whole-Body Teleoperation for Mobile Manipulation at Zero Added Cost [8.71539730969424]
MoMa-Teleop is a novel teleoperation method that infers end-effector motions from existing interfaces.<n>We demonstrate that our approach results in a significant reduction in task completion time across a variety of robots and tasks.
arXiv Detail & Related papers (2024-09-23T15:09:45Z) - Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition [48.65867987106428]
We introduce a novel system for joint learning between human operators and robots.
It enables human operators to share control of a robot end-effector with a learned assistive agent.
It reduces the need for human adaptation while ensuring the collected data is of sufficient quality for downstream tasks.
arXiv Detail & Related papers (2024-06-29T03:37:29Z) - OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning [45.51662378032706]
We present OmniH2O, a learning-based system for whole-body humanoid teleoperation and autonomy.
Using kinematic as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands.
We release the first humanoid whole-body control dataset, OmniH2O-6, containing six everyday tasks, and demonstrate humanoid whole-body skill learning from teleoperated datasets.
arXiv Detail & Related papers (2024-06-13T06:44:46Z) - RealDex: Towards Human-like Grasping for Robotic Dexterous Hand [64.33746404551343]
We introduce RealDex, a pioneering dataset capturing authentic dexterous hand grasping motions infused with human behavioral patterns.<n>RealDex holds immense promise in advancing humanoid robot for automated perception, cognition, and manipulation in real-world scenarios.
arXiv Detail & Related papers (2024-02-21T14:59:46Z) - Giving Robots a Hand: Learning Generalizable Manipulation with
Eye-in-Hand Human Video Demonstrations [66.47064743686953]
Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation.
Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation.
In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies.
arXiv Detail & Related papers (2023-07-12T07:04:53Z) - HERD: Continuous Human-to-Robot Evolution for Learning from Human
Demonstration [57.045140028275036]
We show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning.
We propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy.
arXiv Detail & Related papers (2022-12-08T15:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.