Related papers: InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions

InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions

URL: http://arxiv.org/abs/2602.06035v1
Date: Thu, 05 Feb 2026 18:59:27 GMT
Title: InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions
Authors: Sirui Xu, Samuel Schulter, Morteza Ziyadi, Xialin He, Xiaohan Fei, Yu-Xiong Wang, Liangyan Gui,
Abstract summary: Humans rarely plan whole-body interactions with objects at the level of explicit whole-body movements.<n>Scaling such priors is key to enabling humanoids to compose and generalize loco-manipulation skills.<n>We introduce InterPrior, a framework that learns a unified generative controller through large-scale imitation pretraining and post-training by reinforcement learning.
Score: 58.329946838699044
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Humans rarely plan whole-body interactions with objects at the level of explicit whole-body movements. High-level intentions, such as affordance, define the goal, while coordinated balance, contact, and manipulation can emerge naturally from underlying physical and motor priors. Scaling such priors is key to enabling humanoids to compose and generalize loco-manipulation skills across diverse contexts while maintaining physically coherent whole-body coordination. To this end, we introduce InterPrior, a scalable framework that learns a unified generative controller through large-scale imitation pretraining and post-training by reinforcement learning. InterPrior first distills a full-reference imitation expert into a versatile, goal-conditioned variational policy that reconstructs motion from multimodal observations and high-level intent. While the distilled policy reconstructs training behaviors, it does not generalize reliably due to the vast configuration space of large-scale human-object interactions. To address this, we apply data augmentation with physical perturbations, and then perform reinforcement learning finetuning to improve competence on unseen goals and initializations. Together, these steps consolidate the reconstructed latent skills into a valid manifold, yielding a motion prior that generalizes beyond the training data, e.g., it can incorporate new behaviors such as interactions with unseen objects. We further demonstrate its effectiveness for user-interactive control and its potential for real robot deployment.

Related papers

ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation [55.467742403416175]
We introduce a physics-driven neural algorithm that translates large-scale motion capture to humanoid embodiments.<n>We learn a unified multimodal controller that supports both dense references and sparse task specifications.<n>Results show that ULTRA generalizes to autonomous, goal-conditioned whole-body loco-manipulation from egocentric perception.
arXiv Detail & Related papers (2026-03-03T18:59:29Z)
Learning Whole-Body Human-Humanoid Interaction from Human-Human Demonstrations [63.80827184637476]
We introduce D-STAR, a hierarchical policy that disentangles when to act from where to act.<n>We validate our framework through extensive and rigorous simulations.
arXiv Detail & Related papers (2026-01-14T14:37:06Z)
Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models [80.28579390566298]
We introduce Interact2Ar, a text-conditioned autoregressive diffusion model for generating full-body, human-human interactions.<n>Hand kinematics are incorporated through dedicated parallel branches, enabling high-fidelity full-body generation.<n>Our model enables a series of downstream applications, including temporal motion composition, real-time adaptation to disturbances, and extension beyond dyadic to multi-person scenarios.
arXiv Detail & Related papers (2025-12-22T18:59:50Z)
InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation [1.7523719472700858]
We introduce InteracTalker, a novel framework that seamlessly integrates prompt-based object-aware interactions with co-speech gesture generation.<n>Our framework utilizes a Generalized Motion Adaptation Module that enables independent training, adapting to the corresponding motion condition.<n>InteracTalker successfully unifies these previously separate tasks, outperforming prior methods in both co-speech gesture generation and object-interaction synthesis.
arXiv Detail & Related papers (2025-12-14T12:29:49Z)
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction [76.44108003274955]
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning policies.<n>We introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh.<n>By minimizing the Laplacian deformation between the human and robot meshes, OmniRetarget generates kinematically feasible trajectories.
arXiv Detail & Related papers (2025-09-30T17:59:02Z)
InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions [27.225777494300775]
We introduce InterMimic, a framework that enables a single policy to robustly learn from hours of imperfect MoCap data.<n>Our experiments demonstrate that InterMimic produces realistic and diverse interactions across multiple HOI datasets.
arXiv Detail & Related papers (2025-02-27T18:59:12Z)
Physically Plausible Full-Body Hand-Object Interaction Synthesis [32.83908152822006]
We propose a physics-based method for synthesizing dexterous hand-object interactions in a full-body setting. Existing methods often focus on isolated segments of the interaction process and rely on data-driven techniques that may result in artifacts.
arXiv Detail & Related papers (2023-09-14T17:55:18Z)
Learn to Predict How Humans Manipulate Large-sized Objects from Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task. We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.