Related papers: VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation

VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation

URL: http://arxiv.org/abs/2509.20322v1
Date: Wed, 24 Sep 2025 17:10:02 GMT
Title: VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation
Authors: Shaofeng Yin, Yanjie Ze, Hong-Xing Yu, C. Karen Liu, Jiajun Wu,
Abstract summary: VisualMimic is a visual framework that unifies egocentric vision with hierarchical whole-body control for humanoid robots.<n>VisualMimic enables zero-shot transfer of visuomotor policies trained in simulation to real humanoid robots.
Score: 39.01738745009172
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Humanoid loco-manipulation in unstructured environments demands tight integration of egocentric perception and whole-body control. However, existing approaches either depend on external motion capture systems or fail to generalize across diverse tasks. We introduce VisualMimic, a visual sim-to-real framework that unifies egocentric vision with hierarchical whole-body control for humanoid robots. VisualMimic combines a task-agnostic low-level keypoint tracker -- trained from human motion data via a teacher-student scheme -- with a task-specific high-level policy that generates keypoint commands from visual and proprioceptive input. To ensure stable training, we inject noise into the low-level policy and clip high-level actions using human motion statistics. VisualMimic enables zero-shot transfer of visuomotor policies trained in simulation to real humanoid robots, accomplishing a wide range of loco-manipulation tasks such as box lifting, pushing, football dribbling, and kicking. Beyond controlled laboratory settings, our policies also generalize robustly to outdoor environments. Videos are available at: https://visualmimic.github.io .

Related papers

Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation [14.013652439013692]
This paper presents a new paradigm, HERO, for object loco-manipulation with humanoid robots.<n>We achieve this by designing an accurate residual-aware EE tracking policy.<n>We use this accurate end-effector tracker to build a modular system for loco-manipulation.
arXiv Detail & Related papers (2026-02-18T18:55:02Z)
Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos [56.510263910611684]
We tackle prehensile manipulation, in which tasks involve grasping an object before performing various post-grasp motions.<n>Human videos offer strong signals for learning the post-grasp motions, but they are less useful for learning the prerequisite grasping behaviors.<n>We present Perceive-Simulate-Imitate (PSI), a framework for training a modular manipulation policy using human video motion data.
arXiv Detail & Related papers (2026-02-13T18:59:10Z)
Humanoid Manipulation Interface: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations [25.15848825594207]
We present the Humanoid Manipulation Interface (HuMI), a portable and efficient framework for learning diverse whole-body manipulation tasks.<n>HuMI enables robot-free data collection by capturing rich whole-body motion using portable hardware.<n>HuMI achieves a 3x increase in data collection efficiency compared to teleoperation and attains a 70% success rate in unseen environments.
arXiv Detail & Related papers (2026-02-06T12:10:47Z)
DemoHLM: From One Demonstration to Generalizable Humanoid Loco-Manipulation [29.519071338337685]
We present DemoHLM, a framework for humanoid loco-manipulation on a real humanoid robot from a single demonstration in simulation.<n>whole-body controller maps whole-body motion commands to joint torques and provides omnidirectional mobility for the humanoid robot.<n> Experiments show a positive correlation between the amount of synthetic data and policy performance.
arXiv Detail & Related papers (2025-10-13T10:49:40Z)
ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning [59.64325421657381]
Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks.<n>We introduce ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data.<n>Results show substantial gains in task success, training efficiency, and robustness over strong baselines.
arXiv Detail & Related papers (2025-10-06T17:47:02Z)
HumanoidVerse: A Versatile Humanoid for Vision-Language Guided Multi-Object Rearrangement [51.16740261131198]
We introduce HumanoidVerse, a novel framework for vision-language guided humanoid control.<n>HumanoidVerse supports consecutive manipulation of multiple objects, guided only by natural language instructions and egocentric camera RGB observations.<n>Our work represents a key step toward robust, general-purpose humanoid agents capable of executing complex, sequential tasks under real-world sensory constraints.
arXiv Detail & Related papers (2025-08-23T08:23:14Z)
Feel the Force: Contact-Driven Learning from Humans [52.36160086934298]
Controlling fine-grained forces during manipulation remains a core challenge in robotics.<n>We present FeelTheForce, a robot learning system that models human tactile behavior to learn force-sensitive manipulation.<n>Our approach grounds robust low-level force control in scalable human supervision, achieving a 77% success rate across 5 force-sensitive manipulation tasks.
arXiv Detail & Related papers (2025-06-02T17:57:52Z)
MaskedManipulator: Versatile Whole-Body Manipulation [38.02818493367002]
We introduce MaskedManipulator, a generative control policy distilled from a tracking controller trained on large-scale human motion capture data.<n>This two-stage learning process allows the system to perform complex interaction behaviors, while providing intuitive user control over both character and object motions.
arXiv Detail & Related papers (2025-05-25T10:46:14Z)
ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos [15.809468471562537]
ZeroMimic generates image goal-conditioned skill policies for several common manipulation tasks.<n>We evaluate ZeroMimic's out-of-the-box performance in varied real-world and simulated kitchen settings.<n>To enable plug-and-play reuse of ZeroMimic policies on other task setups and robots, we release software and policy checkpoints.
arXiv Detail & Related papers (2025-03-31T09:27:00Z)
HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit [52.12750762494588]
This paper introduces HOMIE, a semi-autonomous teleoperation system.<n>It combines a reinforcement learning policy for body control mapped to a pedal, an isomorphic exoskeleton arm for arm control, and motion-sensing gloves for hand control.<n>The system is fully open-source, demos and code can be found in our https://homietele.org/.
arXiv Detail & Related papers (2025-02-18T16:33:38Z)
Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models [63.89598561397856]
We present a system for quadrupedal mobile manipulation in indoor environments. It uses a front-mounted gripper for object manipulation, a low-level controller trained in simulation using egocentric depth for agile skills. We evaluate our system in two unseen environments without any real-world data collection or training.
arXiv Detail & Related papers (2024-09-30T20:58:38Z)
Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots [13.229028132036321]
Masked Humanoid Controller (MHC) supports standing, walking, and mimicry of whole and partial-body motions.<n>MHC imitates partially masked motions from a library of behaviors spanning standing, walking, optimized reference trajectories, re-targeted video clips, and human motion capture data.<n>We demonstrate sim-to-real transfer on the real-world Digit V3 humanoid robot.
arXiv Detail & Related papers (2024-07-30T09:10:24Z)
Visual Whole-Body Control for Legged Loco-Manipulation [22.50054654508986]
We study the problem of mobile manipulation using legged robots equipped with an arm. We propose a framework that can conduct the whole-body control autonomously with visual observations.
arXiv Detail & Related papers (2024-03-25T17:26:08Z)
Visual Navigation Among Humans with Optimal Control as a Supervisor [72.5188978268463]
We propose an approach that combines learning-based perception with model-based optimal control to navigate among humans. Our approach is enabled by our novel data-generation tool, HumANav. We demonstrate that the learned navigation policies can anticipate and react to humans without explicitly predicting future human motion.
arXiv Detail & Related papers (2020-03-20T16:13:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.