VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation
- URL: http://arxiv.org/abs/2509.20322v1
- Date: Wed, 24 Sep 2025 17:10:02 GMT
- Title: VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation
- Authors: Shaofeng Yin, Yanjie Ze, Hong-Xing Yu, C. Karen Liu, Jiajun Wu,
- Abstract summary: VisualMimic is a visual framework that unifies egocentric vision with hierarchical whole-body control for humanoid robots.<n>VisualMimic enables zero-shot transfer of visuomotor policies trained in simulation to real humanoid robots.
- Score: 39.01738745009172
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humanoid loco-manipulation in unstructured environments demands tight integration of egocentric perception and whole-body control. However, existing approaches either depend on external motion capture systems or fail to generalize across diverse tasks. We introduce VisualMimic, a visual sim-to-real framework that unifies egocentric vision with hierarchical whole-body control for humanoid robots. VisualMimic combines a task-agnostic low-level keypoint tracker -- trained from human motion data via a teacher-student scheme -- with a task-specific high-level policy that generates keypoint commands from visual and proprioceptive input. To ensure stable training, we inject noise into the low-level policy and clip high-level actions using human motion statistics. VisualMimic enables zero-shot transfer of visuomotor policies trained in simulation to real humanoid robots, accomplishing a wide range of loco-manipulation tasks such as box lifting, pushing, football dribbling, and kicking. Beyond controlled laboratory settings, our policies also generalize robustly to outdoor environments. Videos are available at: https://visualmimic.github.io .
Related papers
- Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation [14.013652439013692]
This paper presents a new paradigm, HERO, for object loco-manipulation with humanoid robots.<n>We achieve this by designing an accurate residual-aware EE tracking policy.<n>We use this accurate end-effector tracker to build a modular system for loco-manipulation.
arXiv Detail & Related papers (2026-02-18T18:55:02Z) - Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos [56.510263910611684]
We tackle prehensile manipulation, in which tasks involve grasping an object before performing various post-grasp motions.<n>Human videos offer strong signals for learning the post-grasp motions, but they are less useful for learning the prerequisite grasping behaviors.<n>We present Perceive-Simulate-Imitate (PSI), a framework for training a modular manipulation policy using human video motion data.
arXiv Detail & Related papers (2026-02-13T18:59:10Z) - Humanoid Manipulation Interface: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations [25.15848825594207]
We present the Humanoid Manipulation Interface (HuMI), a portable and efficient framework for learning diverse whole-body manipulation tasks.<n>HuMI enables robot-free data collection by capturing rich whole-body motion using portable hardware.<n>HuMI achieves a 3x increase in data collection efficiency compared to teleoperation and attains a 70% success rate in unseen environments.
arXiv Detail & Related papers (2026-02-06T12:10:47Z) - DemoHLM: From One Demonstration to Generalizable Humanoid Loco-Manipulation [29.519071338337685]
We present DemoHLM, a framework for humanoid loco-manipulation on a real humanoid robot from a single demonstration in simulation.<n>whole-body controller maps whole-body motion commands to joint torques and provides omnidirectional mobility for the humanoid robot.<n> Experiments show a positive correlation between the amount of synthetic data and policy performance.
arXiv Detail & Related papers (2025-10-13T10:49:40Z) - ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning [59.64325421657381]
Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks.<n>We introduce ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data.<n>Results show substantial gains in task success, training efficiency, and robustness over strong baselines.
arXiv Detail & Related papers (2025-10-06T17:47:02Z) - HumanoidVerse: A Versatile Humanoid for Vision-Language Guided Multi-Object Rearrangement [51.16740261131198]
We introduce HumanoidVerse, a novel framework for vision-language guided humanoid control.<n>HumanoidVerse supports consecutive manipulation of multiple objects, guided only by natural language instructions and egocentric camera RGB observations.<n>Our work represents a key step toward robust, general-purpose humanoid agents capable of executing complex, sequential tasks under real-world sensory constraints.
arXiv Detail & Related papers (2025-08-23T08:23:14Z) - Feel the Force: Contact-Driven Learning from Humans [52.36160086934298]
Controlling fine-grained forces during manipulation remains a core challenge in robotics.<n>We present FeelTheForce, a robot learning system that models human tactile behavior to learn force-sensitive manipulation.<n>Our approach grounds robust low-level force control in scalable human supervision, achieving a 77% success rate across 5 force-sensitive manipulation tasks.
arXiv Detail & Related papers (2025-06-02T17:57:52Z) - MaskedManipulator: Versatile Whole-Body Manipulation [38.02818493367002]
We introduce MaskedManipulator, a generative control policy distilled from a tracking controller trained on large-scale human motion capture data.<n>This two-stage learning process allows the system to perform complex interaction behaviors, while providing intuitive user control over both character and object motions.
arXiv Detail & Related papers (2025-05-25T10:46:14Z) - ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos [15.809468471562537]
ZeroMimic generates image goal-conditioned skill policies for several common manipulation tasks.<n>We evaluate ZeroMimic's out-of-the-box performance in varied real-world and simulated kitchen settings.<n>To enable plug-and-play reuse of ZeroMimic policies on other task setups and robots, we release software and policy checkpoints.
arXiv Detail & Related papers (2025-03-31T09:27:00Z) - HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit [52.12750762494588]
This paper introduces HOMIE, a semi-autonomous teleoperation system.<n>It combines a reinforcement learning policy for body control mapped to a pedal, an isomorphic exoskeleton arm for arm control, and motion-sensing gloves for hand control.<n>The system is fully open-source, demos and code can be found in our https://homietele.org/.
arXiv Detail & Related papers (2025-02-18T16:33:38Z) - Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models [63.89598561397856]
We present a system for quadrupedal mobile manipulation in indoor environments.
It uses a front-mounted gripper for object manipulation, a low-level controller trained in simulation using egocentric depth for agile skills.
We evaluate our system in two unseen environments without any real-world data collection or training.
arXiv Detail & Related papers (2024-09-30T20:58:38Z) - Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots [13.229028132036321]
Masked Humanoid Controller (MHC) supports standing, walking, and mimicry of whole and partial-body motions.<n>MHC imitates partially masked motions from a library of behaviors spanning standing, walking, optimized reference trajectories, re-targeted video clips, and human motion capture data.<n>We demonstrate sim-to-real transfer on the real-world Digit V3 humanoid robot.
arXiv Detail & Related papers (2024-07-30T09:10:24Z) - Visual Whole-Body Control for Legged Loco-Manipulation [22.50054654508986]
We study the problem of mobile manipulation using legged robots equipped with an arm.
We propose a framework that can conduct the whole-body control autonomously with visual observations.
arXiv Detail & Related papers (2024-03-25T17:26:08Z) - Visual Navigation Among Humans with Optimal Control as a Supervisor [72.5188978268463]
We propose an approach that combines learning-based perception with model-based optimal control to navigate among humans.
Our approach is enabled by our novel data-generation tool, HumANav.
We demonstrate that the learned navigation policies can anticipate and react to humans without explicitly predicting future human motion.
arXiv Detail & Related papers (2020-03-20T16:13:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.