OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
- URL: http://arxiv.org/abs/2509.26633v2
- Date: Wed, 08 Oct 2025 23:16:20 GMT
- Title: OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
- Authors: Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C. Karen Liu, Rocky Duan, Guanya Shi,
- Abstract summary: A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning policies.<n>We introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh.<n>By minimizing the Laplacian deformation between the human and robot meshes, OmniRetarget generates kinematically feasible trajectories.
- Score: 76.44108003274955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.
Related papers
- MeshMimic: Geometry-Aware Humanoid Motion Learning through 3D Scene Reconstruction [54.36564144414704]
MeshMimic is an innovative framework that bridges 3D scene reconstruction and embodied intelligence to enable humanoid robots to learn coupled "motion-terrain" interactions directly from video.<n>By leveraging state-of-the-art 3D vision models, our framework precisely segments and reconstructs both human trajectories and the underlying 3D geometry of terrains and objects.
arXiv Detail & Related papers (2026-02-17T17:09:45Z) - Learning Whole-Body Human-Humanoid Interaction from Human-Human Demonstrations [63.80827184637476]
We introduce D-STAR, a hierarchical policy that disentangles when to act from where to act.<n>We validate our framework through extensive and rigorous simulations.
arXiv Detail & Related papers (2026-01-14T14:37:06Z) - Decoupled Generative Modeling for Human-Object Interaction Synthesis [35.78156236836254]
Existing approaches often require manually specified intermediate waypoints and place all optimization objectives on a single network.<n>We propose Decoupled Generative Modeling for Human-Object Interaction Synthesis (DecHOI)<n>A trajectory generator first produces human and object trajectories without prescribed waypoints, and an action generator conditions on these paths to synthesize detailed motions.
arXiv Detail & Related papers (2025-12-22T05:33:59Z) - ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning [59.64325421657381]
Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks.<n>We introduce ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data.<n>Results show substantial gains in task success, training efficiency, and robustness over strong baselines.
arXiv Detail & Related papers (2025-10-06T17:47:02Z) - SimGenHOI: Physically Realistic Whole-Body Humanoid-Object Interaction via Generative Modeling and Reinforcement Learning [6.255814224573073]
SimGenHOI is a unified framework that combines the strengths of generative modeling and reinforcement learning to produce controllable and physically plausible HOI.<n>Our HOI generative model, based on Diffusion Transformers (DiT), predicts a set of key actions conditioned on text prompts, object geometry, sparse object waypoints, and the initial humanoid pose.<n>To ensure physical realism, we design a contact-aware whole-body control policy trained with reinforcement learning, which tracks the generated motions while correcting artifacts such as penetration and foot sliding.
arXiv Detail & Related papers (2025-08-18T15:20:46Z) - 3HANDS Dataset: Learning from Humans for Generating Naturalistic Handovers with Supernumerary Robotic Limbs [64.99122701615151]
Supernumerary robotic limbs (SRLs) are robotic structures integrated closely with the user's body.<n>We present 3HANDS, a novel dataset of object handover interactions between a participant performing a daily activity and another participant enacting a hip-mounted SRL in a naturalistic manner.<n>We present three models that generate naturalistic handover trajectories, one that determines the appropriate handover endpoints, and a third that predicts the moment to initiate a handover.
arXiv Detail & Related papers (2025-03-06T17:23:55Z) - Human-Humanoid Robots Cross-Embodiment Behavior-Skill Transfer Using Decomposed Adversarial Learning from Demonstration [9.42179962375058]
We propose a transferable framework that reduces the data bottleneck by using a unified digital human model as a common prototype.<n>The model learns behavior primitives from human demonstrations through adversarial imitation, and complex robot structures are decomposed into functional components.<n>Our framework is validated on five humanoid robots with diverse configurations.
arXiv Detail & Related papers (2024-12-19T18:41:45Z) - Learning Multimodal Latent Dynamics for Human-Robot Interaction [18.68936554172693]
This article presents a method for learning well-coordinated Human-Robot Interaction (HRI) from Human-Human Interactions (HHI)<n>We devise a hybrid approach using Hidden Markov Models (HMMs) as the latent space priors for a Variational Autoencoder to model a joint distribution over the interacting agents.<n>We find that users perceive our method as more human-like, timely, and accurate and rank our method with a higher degree of preference over other baselines.
arXiv Detail & Related papers (2023-11-27T23:56:59Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - Regularized Deep Signed Distance Fields for Reactive Motion Generation [30.792481441975585]
Distance-based constraints are fundamental for enabling robots to plan their actions and act safely.
We propose Regularized Deep Signed Distance Fields (ReDSDF), a single neural implicit function that can compute smooth distance fields at any scale.
We demonstrate the effectiveness of our approach in representative simulated tasks for whole-body control (WBC) and safe Human-Robot Interaction (HRI) in shared workspaces.
arXiv Detail & Related papers (2022-03-09T14:21:32Z) - ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for
Mobile Manipulation [99.2543521972137]
ReLMoGen is a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals.
Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments.
ReLMoGen shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.
arXiv Detail & Related papers (2020-08-18T08:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.