ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning
- URL: http://arxiv.org/abs/2510.05070v2
- Date: Wed, 08 Oct 2025 06:51:48 GMT
- Title: ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning
- Authors: Siheng Zhao, Yanjie Ze, Yue Wang, C. Karen Liu, Pieter Abbeel, Guanya Shi, Rocky Duan,
- Abstract summary: Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks.<n>We introduce ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data.<n>Results show substantial gains in task success, training efficiency, and robustness over strong baselines.
- Score: 59.64325421657381
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks. While recent advances in general motion tracking (GMT) have enabled humanoids to reproduce diverse human motions, these policies lack the precision and object awareness required for loco-manipulation. To this end, we introduce ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data. First, a GMT policy, trained on large-scale human-only motion, serves as a task-agnostic base for generating human-like whole-body movements. An efficient but precise residual policy is then learned to refine the GMT outputs to improve locomotion and incorporate object interaction. To further facilitate efficient training, we design (i) a point-cloud-based object tracking reward for smoother optimization, (ii) a contact reward that encourages accurate humanoid body-object interactions, and (iii) a curriculum-based virtual object controller to stabilize early training. We evaluate ResMimic in both simulation and on a real Unitree G1 humanoid. Results show substantial gains in task success, training efficiency, and robustness over strong baselines. Videos are available at https://resmimic.github.io/ .
Related papers
- SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control [85.91101551600978]
We show that scaling up model capacity, data, and compute yields a generalist humanoid controller capable of creating natural and robust whole-body movements.<n>We build a foundation model for motion tracking by scaling along three axes: network size, dataset volume, and compute.<n>We show the practical utility of our model through two mechanisms: (1) a real-time universal kinematic planner that bridges motion tracking to downstream task execution, enabling natural and interactive control, and (2) a unified token space that supports various motion input interfaces.
arXiv Detail & Related papers (2025-11-11T04:37:40Z) - OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction [76.44108003274955]
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning policies.<n>We introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh.<n>By minimizing the Laplacian deformation between the human and robot meshes, OmniRetarget generates kinematically feasible trajectories.
arXiv Detail & Related papers (2025-09-30T17:59:02Z) - KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control [30.738592041595933]
We present VMS, a unified whole-body controller that enables humanoid robots to learn diverse and dynamic behaviors within a single policy.<n>Our framework integrates a hybrid tracking objective that balances local motion fidelity with global trajectory consistency.<n>We validate VMS specialization extensively in both simulation and real-world experiments, demonstrating accurate imitation of dynamic skills, stable performance over minute-long sequences, and strong generalization to unseen motions.
arXiv Detail & Related papers (2025-09-20T11:31:14Z) - KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills [50.34487144149439]
This paper presents a physics-based humanoid control framework, aiming to master highly-dynamic human behaviors such as Kungfu and dancing.<n>For motion processing, we design a pipeline to extract, filter out, correct, and retarget motions, while ensuring compliance with physical constraints.<n>For motion imitation, we formulate a bi-level optimization problem to dynamically adjust the tracking accuracy tolerance.<n>In experiments, we train whole-body control policies to imitate a set of highly-dynamic motions.
arXiv Detail & Related papers (2025-06-15T13:58:53Z) - ExBody2: Advanced Expressive Humanoid Whole-Body Control [16.69009772546575]
We propose a method for producing whole-body tracking controllers that are trained on both human motion capture and simulated data.<n>We use a teacher policy to produce intermediate data that better conforms to the robot's kinematics.<n>We observed significant improvement of tracking performance after fine-tuning on a small amount of data.
arXiv Detail & Related papers (2024-12-17T18:59:51Z) - I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning [8.97654258232601]
We develop a framework to control humanoid robots through bounded residual reinforcement learning (I-CTRL)<n>I-CTRL excels in motion imitation with simple and unique rewards that generalize across five robots.<n>Our framework introduces an automatic priority scheduler to manage large-scale motion datasets.
arXiv Detail & Related papers (2024-05-14T16:12:27Z) - Universal Humanoid Motion Representations for Physics-Based Control [71.46142106079292]
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control.
We first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset.
We then create our motion representation by distilling skills directly from the imitator.
arXiv Detail & Related papers (2023-10-06T20:48:43Z) - Task-Oriented Human-Object Interactions Generation with Implicit Neural
Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations.
Our method generates continuous motions that are parameterized only by the temporal coordinate.
This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z) - Task-Generic Hierarchical Human Motion Prior using VAEs [44.356707509079044]
A deep generative model that describes human motions can benefit a wide range of fundamental computer vision and graphics tasks.
We present a method for learning complex human motions independent of specific tasks using a combined global and local latent space.
We demonstrate the effectiveness of our hierarchical motion variational autoencoder in a variety of tasks including video-based human pose estimation.
arXiv Detail & Related papers (2021-06-07T23:11:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.