Related papers: Learning Whole-Body Human-Humanoid Interaction from Human-Human Demonstrations

Learning Whole-Body Human-Humanoid Interaction from Human-Human Demonstrations

URL: http://arxiv.org/abs/2601.09518v1
Date: Wed, 14 Jan 2026 14:37:06 GMT
Title: Learning Whole-Body Human-Humanoid Interaction from Human-Human Demonstrations
Authors: Wei-Jin Huang, Yue-Yi Zhang, Yi-Lin Wei, Zhi-Wei Xia, Juantao Tan, Yuan-Ming Li, Zhilin Zhao, Wei-Shi Zheng,
Abstract summary: We introduce D-STAR, a hierarchical policy that disentangles when to act from where to act.<n>We validate our framework through extensive and rigorous simulations.
Score: 63.80827184637476
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Enabling humanoid robots to physically interact with humans is a critical frontier, but progress is hindered by the scarcity of high-quality Human-Humanoid Interaction (HHoI) data. While leveraging abundant Human-Human Interaction (HHI) data presents a scalable alternative, we first demonstrate that standard retargeting fails by breaking the essential contacts. We address this with PAIR (Physics-Aware Interaction Retargeting), a contact-centric, two-stage pipeline that preserves contact semantics across morphology differences to generate physically consistent HHoI data. This high-quality data, however, exposes a second failure: conventional imitation learning policies merely mimic trajectories and lack interactive understanding. We therefore introduce D-STAR (Decoupled Spatio-Temporal Action Reasoner), a hierarchical policy that disentangles when to act from where to act. In D-STAR, Phase Attention (when) and a Multi-Scale Spatial module (where) are fused by the diffusion head to produce synchronized whole-body behaviors beyond mimicry. By decoupling these reasoning streams, our model learns robust temporal phases without being distracted by spatial noise, leading to responsive, synchronized collaboration. We validate our framework through extensive and rigorous simulations, demonstrating significant performance gains over baseline approaches and a complete, effective pipeline for learning complex whole-body interactions from HHI data.

Related papers

Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models [80.28579390566298]
We introduce Interact2Ar, a text-conditioned autoregressive diffusion model for generating full-body, human-human interactions.<n>Hand kinematics are incorporated through dedicated parallel branches, enabling high-fidelity full-body generation.<n>Our model enables a series of downstream applications, including temporal motion composition, real-time adaptation to disturbances, and extension beyond dyadic to multi-person scenarios.
arXiv Detail & Related papers (2025-12-22T18:59:50Z)
3D Human-Human Interaction Anomaly Detection [9.82406406771152]
Human-Human Interaction Anomaly Detection (H2IAD) aims to identify anomalous interactive behaviors within collaborative 3D human actions.<n>IADNet outperforms existing Human-centric AD baselines in H2IAD.
arXiv Detail & Related papers (2025-12-15T17:17:55Z)
Incremental Human-Object Interaction Detection with Invariant Relation Representation Learning [24.134958315913888]
In open-world environments, human-object interactions (HOIs) evolve continuously, challenging conventional closed-world HOI detection models.<n>Inspired by humans' ability to progressively acquire knowledge, we explore incremental HOI detection (IHOID) to develop agents capable of discerning human-object relations in such dynamic environments.<n>We propose a novel exemplar-free incremental relation distillation (IRD) framework. IRD decouples the learning of objects and relations, and introduces two unique distillation losses for learning invariant relation features across different HOI combinations that share the same relation.
arXiv Detail & Related papers (2025-10-30T21:59:34Z)
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction [76.44108003274955]
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning policies.<n>We introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh.<n>By minimizing the Laplacian deformation between the human and robot meshes, OmniRetarget generates kinematically feasible trajectories.
arXiv Detail & Related papers (2025-09-30T17:59:02Z)
HHI-Assist: A Dataset and Benchmark of Human-Human Interaction in Physical Assistance Scenario [63.77482302352545]
HHI-Assist is a dataset comprising motion capture clips of human-human interactions in assistive tasks.<n>Our work has the potential to significantly enhance robotic assistance policies.
arXiv Detail & Related papers (2025-09-12T09:38:17Z)
HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models [42.62823339416957]
We address the problem of generating realistic 3D human-object interactions (HOIs) driven by textual prompts.<n>We first develop a dual-branch diffusion model (HOI-DM) to generate both human and object motions conditioned on the input text.<n>We also develop an affordance prediction diffusion model (APDM) to predict the contacting area between the human and object.
arXiv Detail & Related papers (2023-12-11T17:41:17Z)
HODN: Disentangling Human-Object Feature for HOI Detection [51.48164941412871]
We propose a Human and Object Disentangling Network (HODN) to model the Human-Object Interaction (HOI) relationships explicitly. Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions. Our proposed method achieves competitive performance on both the V-COCO and the HICO-Det Linking datasets.
arXiv Detail & Related papers (2023-08-20T04:12:50Z)
RobustFusion: Robust Volumetric Performance Reconstruction under Human-object Interactions from Monocular RGBD Stream [27.600873320989276]
High-quality 4D reconstruction of human performance with complex interactions to various objects is essential in real-world scenarios. Recent advances still fail to provide reliable performance reconstruction. We propose RobustFusion, a robust volumetric performance reconstruction system for human-object interaction scenarios.
arXiv Detail & Related papers (2021-04-30T08:41:45Z)
Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs. Our network predicts interaction points, which directly localize and classify the inter-action. Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.