Embodiment-Aware Generalist Specialist Distillation for Unified Humanoid Whole-Body Control
- URL: http://arxiv.org/abs/2602.02960v1
- Date: Tue, 03 Feb 2026 00:58:29 GMT
- Title: Embodiment-Aware Generalist Specialist Distillation for Unified Humanoid Whole-Body Control
- Authors: Quanquan Peng, Yunfeng Lin, Yufei Xue, Jiangmiao Pang, Weinan Zhang,
- Abstract summary: We introduce an iterative generalist-specialist distillation framework that produces a single unified policy that controls multiple humanoids.<n>We conducted experiments on five different robots in simulation and four in real-world settings.
- Score: 34.056581843277904
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Humanoid Whole-Body Controllers trained with reinforcement learning (RL) have recently achieved remarkable performance, yet many target a single robot embodiment. Variations in dynamics, degrees of freedom (DoFs), and kinematic topology still hinder a single policy from commanding diverse humanoids. Moreover, obtaining a generalist policy that not only transfers across embodiments but also supports richer behaviors-beyond simple walking to squatting, leaning-remains especially challenging. In this work, we tackle these obstacles by introducing EAGLE, an iterative generalist-specialist distillation framework that produces a single unified policy that controls multiple heterogeneous humanoids without per-robot reward tuning. During each cycle, embodiment-specific specialists are forked from the current generalist, refined on their respective robots, and new skills are distilled back into the generalist by training on the pooled embodiment set. Repeating this loop until performance convergence produces a robust Whole-Body Controller validated on robots such as Unitree H1, G1, and Fourier N1. We conducted experiments on five different robots in simulation and four in real-world settings. Through quantitative evaluations, EAGLE achieves high tracking accuracy and robustness compared to other methods, marking a step toward scalable, fleet-level humanoid control. See more details at https://eagle-wbc.github.io/
Related papers
- ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation [55.467742403416175]
We introduce a physics-driven neural algorithm that translates large-scale motion capture to humanoid embodiments.<n>We learn a unified multimodal controller that supports both dense references and sparse task specifications.<n>Results show that ULTRA generalizes to autonomous, goal-conditioned whole-body loco-manipulation from egocentric perception.
arXiv Detail & Related papers (2026-03-03T18:59:29Z) - FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions [147.04372611893032]
We present FRoM-W1, an open-source framework designed to achieve general humanoid whole-body motion control using natural language.<n>We extensively evaluate FRoM-W1 on Unitree H1 and G1 robots.<n>Results demonstrate superior performance on the HumanML3D-X benchmark for human whole-body motion generation.
arXiv Detail & Related papers (2026-01-19T07:59:32Z) - CHIP: Adaptive Compliance for Humanoid Control through Hindsight Perturbation [70.5382178207975]
hIsight Perturbation (CHIP) is a plug-and-play module that enables controllable end-effector stiffness.<n>CHIP is easy to implement and requires neither data augmentation nor additional reward tuning.<n>We show that a generalist motion-tracking controller trained with CHIP can perform a diverse set of forceful manipulation tasks.
arXiv Detail & Related papers (2025-12-16T18:56:04Z) - GBC: Generalized Behavior-Cloning Framework for Whole-Body Humanoid Imitation [5.426712963311386]
Generalized Behavior Cloning (GBC) is a comprehensive and unified solution designed to solve this end-to-end challenge.<n>First, an adaptive data pipeline leverages a differentiable IK network to automatically retarget any human MoCap data to any humanoid.<n>Second, our novel DAgger-MMPPO algorithm with its MMTransformer architecture learns robust, high-fidelity imitation policies.
arXiv Detail & Related papers (2025-08-13T17:28:39Z) - From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots [35.26305396688982]
BumbleBee is an expert-generalist learning framework that combines motion clustering and sim-to-real adaptation.<n> Experiments on two simulations and a real humanoid robot demonstrate that BB achieves state-of-the-art general whole-body control.
arXiv Detail & Related papers (2025-06-15T09:09:34Z) - GRoQ-LoCO: Generalist and Robot-agnostic Quadruped Locomotion Control using Offline Datasets [0.8678250057211367]
GRoQ-LoCO is a scalable, attention-based framework that learns a single generalist locomotion policy across multiple quadruped robots and terrains.<n>Our framework operates solely on proprioceptive data from all robots without incorporating any robot-specific encodings.<n>Results demonstrate the potential of offline, data-driven learning to generalize locomotion across diverse quadruped morphologies and behaviors.
arXiv Detail & Related papers (2025-05-16T08:17:01Z) - The One RING: a Robotic Indoor Navigation Generalist [58.30694487843546]
RING (Robotic Indoor Navigation Generalist) is an embodiment-agnostic policy that turns any mobile robot into an effective indoor semantic navigator.<n>Trained entirely in simulation, RING leverages large-scale randomization over robot embodiments to enable robust generalization to many real-world platforms.
arXiv Detail & Related papers (2024-12-18T23:15:41Z) - Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance [66.51390591688802]
Value-Guided Policy Steering (V-GPS) is compatible with a wide range of different generalist policies, without needing to fine-tune or even access the weights of the policy.<n>We show that the same value function can improve the performance of five different state-of-the-art policies with different architectures.
arXiv Detail & Related papers (2024-10-17T17:46:26Z) - Low-Rank Modular Reinforcement Learning via Muscle Synergy [25.120547719120765]
Modular Reinforcement Learning (RL) decentralizes the control of multi-joint robots by learning policies for each actuator.
We propose a Synergy-Oriented LeARning (SOLAR) framework that exploits the redundant nature of DoF in robot control.
arXiv Detail & Related papers (2022-10-26T16:01:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.