Related papers: KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control

KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control

URL: http://arxiv.org/abs/2509.16638v1
Date: Sat, 20 Sep 2025 11:31:14 GMT
Title: KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control
Authors: Jinrui Han, Weiji Xie, Jiakun Zheng, Jiyuan Shi, Weinan Zhang, Ting Xiao, Chenjia Bai,
Abstract summary: We present VMS, a unified whole-body controller that enables humanoid robots to learn diverse and dynamic behaviors within a single policy.<n>Our framework integrates a hybrid tracking objective that balances local motion fidelity with global trajectory consistency.<n>We validate VMS specialization extensively in both simulation and real-world experiments, demonstrating accurate imitation of dynamic skills, stable performance over minute-long sequences, and strong generalization to unseen motions.
Score: 30.738592041595933
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Learning versatile whole-body skills by tracking various human motions is a fundamental step toward general-purpose humanoid robots. This task is particularly challenging because a single policy must master a broad repertoire of motion skills while ensuring stability over long-horizon sequences. To this end, we present VMS, a unified whole-body controller that enables humanoid robots to learn diverse and dynamic behaviors within a single policy. Our framework integrates a hybrid tracking objective that balances local motion fidelity with global trajectory consistency, and an Orthogonal Mixture-of-Experts (OMoE) architecture that encourages skill specialization while enhancing generalization across motions. A segment-level tracking reward is further introduced to relax rigid step-wise matching, enhancing robustness when handling global displacements and transient inaccuracies. We validate VMS extensively in both simulation and real-world experiments, demonstrating accurate imitation of dynamic skills, stable performance over minute-long sequences, and strong generalization to unseen motions. These results highlight the potential of VMS as a scalable foundation for versatile humanoid whole-body control. The project page is available at https://kungfubot2-humanoid.github.io.

Related papers

ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation [55.467742403416175]
We introduce a physics-driven neural algorithm that translates large-scale motion capture to humanoid embodiments.<n>We learn a unified multimodal controller that supports both dense references and sparse task specifications.<n>Results show that ULTRA generalizes to autonomous, goal-conditioned whole-body loco-manipulation from egocentric perception.
arXiv Detail & Related papers (2026-03-03T18:59:29Z)
Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching [77.28042137892943]
We present Perceptive Humanoid Parkour (PHP), a modular framework that enables humanoid robots to autonomously perform long-horizon, vision-based parkour.<n>We train motion-tracking reinforcement learning expert policies for these composed motions, and distill them into a single depth-based, multi-skill student policy.<n>We validate our framework with extensive real-world experiments on a Unitree G1 humanoid robot.
arXiv Detail & Related papers (2026-02-17T18:59:11Z)
FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions [147.04372611893032]
We present FRoM-W1, an open-source framework designed to achieve general humanoid whole-body motion control using natural language.<n>We extensively evaluate FRoM-W1 on Unitree H1 and G1 robots.<n>Results demonstrate superior performance on the HumanML3D-X benchmark for human whole-body motion generation.
arXiv Detail & Related papers (2026-01-19T07:59:32Z)
UniAct: Unified Motion Generation and Action Streaming for Humanoid Robots [27.794309591475326]
A long-standing objective in humanoid robotics is the realization of versatile agents capable of following diverse multimodal instructions with human-level flexibility.<n>Here we show that UniAct, a two-stage framework integrating a fine-tuned MLLM with a causal streaming pipeline, enables humanoid robots to execute multimodal instructions with sub-500 ms latency.<n>This approach yields a 19% improvement in the success rate of zero-shot tracking of imperfect reference motions.
arXiv Detail & Related papers (2025-12-30T16:20:13Z)
ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning [59.64325421657381]
Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks.<n>We introduce ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data.<n>Results show substantial gains in task success, training efficiency, and robustness over strong baselines.
arXiv Detail & Related papers (2025-10-06T17:47:02Z)
KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills [50.34487144149439]
This paper presents a physics-based humanoid control framework, aiming to master highly-dynamic human behaviors such as Kungfu and dancing.<n>For motion processing, we design a pipeline to extract, filter out, correct, and retarget motions, while ensuring compliance with physical constraints.<n>For motion imitation, we formulate a bi-level optimization problem to dynamically adjust the tracking accuracy tolerance.<n>In experiments, we train whole-body control policies to imitate a set of highly-dynamic motions.
arXiv Detail & Related papers (2025-06-15T13:58:53Z)
Learning Humanoid Standing-up Control across Diverse Postures [27.79222176982376]
Standing-up control is crucial for humanoid robots, with the potential for integration into current locomotion and loco-manipulation systems.<n>We present HoST (Humanoid Standing-up Control), a reinforcement learning framework that learns standing-up control from scratch.<n>Our experimental results demonstrate that the controllers achieve smooth, stable, and robust standing-up motions across a wide range of laboratory and outdoor environments.
arXiv Detail & Related papers (2025-02-12T13:10:09Z)
ExBody2: Advanced Expressive Humanoid Whole-Body Control [16.69009772546575]
We propose a method for producing whole-body tracking controllers that are trained on both human motion capture and simulated data.<n>We use a teacher policy to produce intermediate data that better conforms to the robot's kinematics.<n>We observed significant improvement of tracking performance after fine-tuning on a small amount of data.
arXiv Detail & Related papers (2024-12-17T18:59:51Z)
Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots [13.229028132036321]
Masked Humanoid Controller (MHC) supports standing, walking, and mimicry of whole and partial-body motions.<n>MHC imitates partially masked motions from a library of behaviors spanning standing, walking, optimized reference trajectories, re-targeted video clips, and human motion capture data.<n>We demonstrate sim-to-real transfer on the real-world Digit V3 humanoid robot.
arXiv Detail & Related papers (2024-07-30T09:10:24Z)
Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control [106.32794844077534]
This paper presents a study on using deep reinforcement learning to create dynamic locomotion controllers for bipedal robots. We develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. This work pushes the limits of agility for bipedal robots through extensive real-world experiments.
arXiv Detail & Related papers (2024-01-30T10:48:43Z)
Universal Humanoid Motion Representations for Physics-Based Control [71.46142106079292]
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control. We first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset. We then create our motion representation by distilling skills directly from the imitator.
arXiv Detail & Related papers (2023-10-06T20:48:43Z)
UniCon: Universal Neural Controller For Physics-based Character Motion [70.45421551688332]
We propose a physics-based universal neural controller (UniCon) that learns to master thousands of motions with different styles by learning on large-scale motion datasets. UniCon can support keyboard-driven control, compose motion sequences drawn from a large pool of locomotion and acrobatics skills and teleport a person captured on video to a physics-based virtual avatar.
arXiv Detail & Related papers (2020-11-30T18:51:16Z)
Residual Force Control for Agile Human Behavior Imitation and Extended Motion Synthesis [32.22704734791378]
Reinforcement learning has shown great promise for realistic human behaviors by learning humanoid control policies from motion capture data. It is still very challenging to reproduce sophisticated human skills like ballet dance, or to stably imitate long-term human behaviors with complex transitions. We propose a novel approach, residual force control (RFC), that augments a humanoid control policy by adding external residual forces into the action space.
arXiv Detail & Related papers (2020-06-12T17:56:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.