ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation
- URL: http://arxiv.org/abs/2503.13130v1
- Date: Mon, 17 Mar 2025 12:55:34 GMT
- Title: ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation
- Authors: Ling-An Zeng, Guohong Huang, Yi-Lin Wei, Shengbo Gu, Yu-Ming Tang, Jingke Meng, Wei-Shi Zheng,
- Abstract summary: ChainHOI is a novel approach for text-driven human-object interaction generation.<n>It explicitly models interactions at both the joint and kinetic chain levels.
- Score: 25.777159581915658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose ChainHOI, a novel approach for text-driven human-object interaction (HOI) generation that explicitly models interactions at both the joint and kinetic chain levels. Unlike existing methods that implicitly model interactions using full-body poses as tokens, we argue that explicitly modeling joint-level interactions is more natural and effective for generating realistic HOIs, as it directly captures the geometric and semantic relationships between joints, rather than modeling interactions in the latent pose space. To this end, ChainHOI introduces a novel joint graph to capture potential interactions with objects, and a Generative Spatiotemporal Graph Convolution Network to explicitly model interactions at the joint level. Furthermore, we propose a Kinematics-based Interaction Module that explicitly models interactions at the kinetic chain level, ensuring more realistic and biomechanically coherent motions. Evaluations on two public datasets demonstrate that ChainHOI significantly outperforms previous methods, generating more realistic, and semantically consistent HOIs. Code is available \href{https://github.com/qinghuannn/ChainHOI}{here}.
Related papers
- Efficient Explicit Joint-level Interaction Modeling with Mamba for Text-guided HOI Generation [25.770855154106453]
We introduce an Efficient Explicit Joint-level Interaction Model (EJIM) for generating text-guided human-object interactions.
EJIM features a Dual-branch HOI Mamba that separately and efficiently models human object and motions.
We show that EJIM surpasses previous works by a large margin while using only 5% of the inference time.
arXiv Detail & Related papers (2025-03-29T15:23:21Z) - Text-Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation [14.707224594220264]
We propose a Text-Derived Graph Network (TRG-Net) to enhance both modeling and supervision.
For modeling, the Dynamic Spatio-Temporal Fusion Modeling (D) method incorporates Text-Derived Joint Graphs (JGT) with channel adaptation.
For supervision, the Absolute-Relative Inter-Class Supervision (ARIS) method employs contrastive learning between action features and text embeddings to regularize the absolute class.
arXiv Detail & Related papers (2025-03-19T11:38:14Z) - InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction [27.10256777126629]
This paper showcases the potential of generating human-object interactions without direct training on text-interaction pair data.
We introduce a world model designed to comprehend simple physics, modeling how human actions influence object motion.
By integrating these components, our novel framework, InterDreamer, is able to generate text-aligned 3D HOI sequences in a zero-shot manner.
arXiv Detail & Related papers (2024-03-28T17:59:30Z) - THOR: Text to Human-Object Interaction Diffusion via Relation Intervention [51.02435289160616]
We propose a novel Text-guided Human-Object Interaction diffusion model with Relation Intervention (THOR)
In each diffusion step, we initiate text-guided human and object motion and then leverage human-object relations to intervene in object motion.
We construct Text-BEHAVE, a Text2HOI dataset that seamlessly integrates textual descriptions with the currently largest publicly available 3D HOI dataset.
arXiv Detail & Related papers (2024-03-17T13:17:25Z) - Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - Learning Mutual Excitation for Hand-to-Hand and Human-to-Human Interaction Recognition [21.007782102151282]
We propose a mutual excitation graph convolutional network (me-GCN) by stacking mutual excitation graph convolution layers.<n>Me-GC learns mutual information in each layer and each stage of graph convolution operations.<n>Our proposed me-GC outperforms state-of-the-art GCN-based and Transformer-based methods.
arXiv Detail & Related papers (2024-02-04T10:00:00Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - A Probabilistic Model Of Interaction Dynamics for Dyadic Face-to-Face
Settings [1.9544213396776275]
We develop a probabilistic model to capture the interaction dynamics between pairs of participants in a face-to-face setting.
This interaction encoding is then used to influence the generation when predicting one agent's future dynamics.
We show that our model successfully delineates between the modes, based on their interacting dynamics.
arXiv Detail & Related papers (2022-07-10T23:31:27Z) - Dynamic Representation Learning with Temporal Point Processes for
Higher-Order Interaction Forecasting [8.680676599607123]
This paper proposes a temporal point process model for hyperedge prediction to address these problems.
As far as our knowledge, this is the first work that uses the temporal point process to forecast hyperedges in dynamic networks.
arXiv Detail & Related papers (2021-12-19T14:24:37Z) - VIRT: Improving Representation-based Models for Text Matching through
Virtual Interaction [50.986371459817256]
We propose a novel textitVirtual InteRacTion mechanism, termed as VIRT, to enable full and deep interaction modeling in representation-based models.
VIRT asks representation-based encoders to conduct virtual interactions to mimic the behaviors as interaction-based models do.
arXiv Detail & Related papers (2021-12-08T09:49:28Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.