Related papers: From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots

From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots

URL: http://arxiv.org/abs/2506.12779v3
Date: Tue, 02 Sep 2025 12:06:20 GMT
Title: From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots
Authors: Yuxuan Wang, Ming Yang, Ziluo Ding, Yu Zhang, Weishuai Zeng, Xinrun Xu, Haobin Jiang, Zongqing Lu,
Abstract summary: BumbleBee is an expert-generalist learning framework that combines motion clustering and sim-to-real adaptation.<n> Experiments on two simulations and a real humanoid robot demonstrate that BB achieves state-of-the-art general whole-body control.
Score: 35.26305396688982
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Achieving general agile whole-body control on humanoid robots remains a major challenge due to diverse motion demands and data conflicts. While existing frameworks excel in training single motion-specific policies, they struggle to generalize across highly varied behaviors due to conflicting control requirements and mismatched data distributions. In this work, we propose BumbleBee (BB), an expert-generalist learning framework that combines motion clustering and sim-to-real adaptation to overcome these challenges. BB first leverages an autoencoder-based clustering method to group behaviorally similar motions using motion features and motion descriptions. Expert policies are then trained within each cluster and refined with real-world data through iterative delta action modeling to bridge the sim-to-real gap. Finally, these experts are distilled into a unified generalist controller that preserves agility and robustness across all motion types. Experiments on two simulations and a real humanoid robot demonstrate that BB achieves state-of-the-art general whole-body control, setting a new benchmark for agile, robust, and generalizable humanoid performance in the real world. The project webpage is available at https://beingbeyond.github.io/BumbleBee/.

Related papers

ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation [55.467742403416175]
We introduce a physics-driven neural algorithm that translates large-scale motion capture to humanoid embodiments.<n>We learn a unified multimodal controller that supports both dense references and sparse task specifications.<n>Results show that ULTRA generalizes to autonomous, goal-conditioned whole-body loco-manipulation from egocentric perception.
arXiv Detail & Related papers (2026-03-03T18:59:29Z)
Embodiment-Aware Generalist Specialist Distillation for Unified Humanoid Whole-Body Control [34.056581843277904]
We introduce an iterative generalist-specialist distillation framework that produces a single unified policy that controls multiple humanoids.<n>We conducted experiments on five different robots in simulation and four in real-world settings.
arXiv Detail & Related papers (2026-02-03T00:58:29Z)
FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions [147.04372611893032]
We present FRoM-W1, an open-source framework designed to achieve general humanoid whole-body motion control using natural language.<n>We extensively evaluate FRoM-W1 on Unitree H1 and G1 robots.<n>Results demonstrate superior performance on the HumanML3D-X benchmark for human whole-body motion generation.
arXiv Detail & Related papers (2026-01-19T07:59:32Z)
HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies [83.41714103649751]
Development of embodied intelligence models depends on access to high-quality robot demonstration data.<n>We present HiMoE-VLA, a novel vision-language-action framework tailored to handle diverse robotic data with heterogeneity.<n>HiMoE-VLA demonstrates a consistent performance boost over existing VLA baselines, achieving higher accuracy and robust generalizations.
arXiv Detail & Related papers (2025-12-05T13:21:05Z)
DemoHLM: From One Demonstration to Generalizable Humanoid Loco-Manipulation [29.519071338337685]
We present DemoHLM, a framework for humanoid loco-manipulation on a real humanoid robot from a single demonstration in simulation.<n>whole-body controller maps whole-body motion commands to joint torques and provides omnidirectional mobility for the humanoid robot.<n> Experiments show a positive correlation between the amount of synthetic data and policy performance.
arXiv Detail & Related papers (2025-10-13T10:49:40Z)
KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control [30.738592041595933]
We present VMS, a unified whole-body controller that enables humanoid robots to learn diverse and dynamic behaviors within a single policy.<n>Our framework integrates a hybrid tracking objective that balances local motion fidelity with global trajectory consistency.<n>We validate VMS specialization extensively in both simulation and real-world experiments, demonstrating accurate imitation of dynamic skills, stable performance over minute-long sequences, and strong generalization to unseen motions.
arXiv Detail & Related papers (2025-09-20T11:31:14Z)
GBC: Generalized Behavior-Cloning Framework for Whole-Body Humanoid Imitation [5.426712963311386]
Generalized Behavior Cloning (GBC) is a comprehensive and unified solution designed to solve this end-to-end challenge.<n>First, an adaptive data pipeline leverages a differentiable IK network to automatically retarget any human MoCap data to any humanoid.<n>Second, our novel DAgger-MMPPO algorithm with its MMTransformer architecture learns robust, high-fidelity imitation policies.
arXiv Detail & Related papers (2025-08-13T17:28:39Z)
Modular Recurrence in Contextual MDPs for Universal Morphology Control [0.0]
Generalization to new, unseen robots, however, remains a challenge.<n>We implement a modular recurrent architecture and evaluate its generalization performance on a large set of MuJoCo robots.
arXiv Detail & Related papers (2025-06-10T09:44:30Z)
GENMO: A GENeralist Model for Human MOtion [64.16188966024542]
We present GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework.<n>Our key insight is to reformulate motion estimation as constrained motion generation, where the output motion must precisely satisfy observed conditioning signals.<n>Our novel architecture handles variable-length motions and mixed multimodal conditions (text, audio, video) at different time intervals, offering flexible control.
arXiv Detail & Related papers (2025-05-02T17:59:55Z)
ModSkill: Physical Character Skill Modularization [21.33764810227885]
We introduce a novel skill learning framework, ModSkill, that decouples complex full-body skills into compositional, modular skills for independent body parts.<n>Our results show that this modularized skill learning framework, enhanced by generative sampling, outperforms existing methods in precise full-body motion tracking.
arXiv Detail & Related papers (2025-02-19T22:55:49Z)
ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills [46.16771391136412]
ASAP is a two-stage framework designed to tackle the dynamics mismatch and enable agile humanoid whole-body skills.<n>In the first stage, we pre-train motion tracking policies in simulation using retargeted human motion data.<n>In the second stage, we deploy the policies in the real world and collect real-world data to train a delta (residual) action model.
arXiv Detail & Related papers (2025-02-03T08:22:46Z)
Universal Actions for Enhanced Embodied Foundation Models [25.755178700280933]
We introduce UniAct, a new embodied foundation modeling framework operating in a Universal Action Space.<n>Our learned universal actions capture the generic atomic behaviors across diverse robots by exploiting their shared structural features.<n>Our 0.5B instantiation of UniAct outperforms 14X larger SOTA embodied foundation models in extensive evaluations on various real-world and simulation robots.
arXiv Detail & Related papers (2025-01-17T10:45:22Z)
The One RING: a Robotic Indoor Navigation Generalist [58.30694487843546]
RING (Robotic Indoor Navigation Generalist) is an embodiment-agnostic policy that turns any mobile robot into an effective indoor semantic navigator.<n>Trained entirely in simulation, RING leverages large-scale randomization over robot embodiments to enable robust generalization to many real-world platforms.
arXiv Detail & Related papers (2024-12-18T23:15:41Z)
CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation [43.12717215650305]
We present CrowdMoGen, the first zero-shot framework for collective motion generation.<n>CrowdMoGen effectively groups individuals and generates event-aligned motion sequences from text prompts.<n>As the first framework of collective motion generation, CrowdMoGen has the potential to advance applications in urban simulation, crowd planning, and other large-scale interactive environments.
arXiv Detail & Related papers (2024-07-08T17:59:36Z)
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis [102.1876259853457]
We propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX. RoboCodeX decomposes high-level human instructions into multiple object-centric manipulation units consisting of physical preferences such as affordance and safety constraints. To further enhance the capability to map conceptual and perceptual understanding into control commands, a specialized multimodal reasoning dataset is collected for pre-training and an iterative self-updating methodology is introduced for supervised fine-tuning.
arXiv Detail & Related papers (2024-02-25T15:31:43Z)
Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.<n>Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.<n>Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z)
Model Predictive Control for Fluid Human-to-Robot Handovers [50.72520769938633]
Planning motions that take human comfort into account is not a part of the human-robot handover process. We propose to generate smooth motions via an efficient model-predictive control framework. We conduct human-to-robot handover experiments on a diverse set of objects with several users.
arXiv Detail & Related papers (2022-03-31T23:08:20Z)
On the Emergence of Whole-body Strategies from Humanoid Robot Push-recovery Learning [32.070068456106895]
We apply model-free Deep Reinforcement Learning for training a general and robust humanoid push-recovery policy in a simulation environment. Our method targets high-dimensional whole-body humanoid control and is validated on the iCub humanoid.
arXiv Detail & Related papers (2021-04-29T17:49:20Z)
Deep Imitation Learning for Bimanual Robotic Manipulation [70.56142804957187]
We present a deep imitation learning framework for robotic bimanual manipulation. A core challenge is to generalize the manipulation skills to objects in different locations. We propose to (i) decompose the multi-modal dynamics into elemental movement primitives, (ii) parameterize each primitive using a recurrent graph neural network to capture interactions, and (iii) integrate a high-level planner that composes primitives sequentially and a low-level controller to combine primitive dynamics and inverse kinematics control.
arXiv Detail & Related papers (2020-10-11T01:40:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.