Related papers: Guiding Human-Object Interactions with Rich Geometry and Relations

Guiding Human-Object Interactions with Rich Geometry and Relations

URL: http://arxiv.org/abs/2503.20172v1
Date: Wed, 26 Mar 2025 02:57:18 GMT
Title: Guiding Human-Object Interactions with Rich Geometry and Relations
Authors: Mengqing Xue, Yifei Liu, Ling Guo, Shaoli Huang, Changxing Ding,
Abstract summary: Existing methods often rely on simplified object representations, such as the object's centroid or the nearest point to a human, to achieve physically plausible motions.<n>We introduce ROG, a novel framework that addresses relationships inherent in HOIs with rich geometric detail.<n>We show that ROG significantly outperforms state-of-the-art methods in the realism evaluations and semantic accuracy of synthesized HOIs.
Score: 21.528466852204627
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human-object interaction (HOI) synthesis is crucial for creating immersive and realistic experiences for applications such as virtual reality. Existing methods often rely on simplified object representations, such as the object's centroid or the nearest point to a human, to achieve physically plausible motions. However, these approaches may overlook geometric complexity, resulting in suboptimal interaction fidelity. To address this limitation, we introduce ROG, a novel diffusion-based framework that models the spatiotemporal relationships inherent in HOIs with rich geometric detail. For efficient object representation, we select boundary-focused and fine-detail key points from the object mesh, ensuring a comprehensive depiction of the object's geometry. This representation is used to construct an interactive distance field (IDF), capturing the robust HOI dynamics. Furthermore, we develop a diffusion-based relation model that integrates spatial and temporal attention mechanisms, enabling a better understanding of intricate HOI relationships. This relation model refines the generated motion's IDF, guiding the motion generation process to produce relation-aware and semantically aligned movements. Experimental evaluations demonstrate that ROG significantly outperforms state-of-the-art methods in the realism and semantic accuracy of synthesized HOIs.

Related papers

SIGHT: Synthesizing Image-Text Conditioned and Geometry-Guided 3D Hand-Object Trajectories [124.24041272390954]
Modeling hand-object interaction priors holds significant potential to advance robotic and embodied AI systems.<n>We introduce SIGHT, a novel task focused on generating realistic and physically plausible 3D hand-object interaction trajectories from a single image.<n>We propose SIGHT-Fusion, a novel diffusion-based image-text conditioned generative model that tackles this task by retrieving the most similar 3D object mesh from a database.
arXiv Detail & Related papers (2025-03-28T20:53:20Z)
OOD-HOI: Text-Driven 3D Whole-Body Human-Object Interactions Generation Beyond Training Domains [66.62502882481373]
Current methods tend to focus either on the body or the hands, which limits their ability to produce cohesive and realistic interactions.<n>We propose OOD-HOI, a text-driven framework for generating whole-body human-object interactions that generalize well to new objects and actions.<n>Our approach integrates a dual-branch reciprocal diffusion model to synthesize initial interaction poses, a contact-guided interaction refiner to improve physical accuracy based on predicted contact areas, and a dynamic adaptation mechanism which includes semantic adjustment and geometry deformation to improve robustness.
arXiv Detail & Related papers (2024-11-27T10:13:35Z)
Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation [52.36691633451968]
ViTaM-D is a visual-tactile framework for dynamic hand-object interaction reconstruction. DF-Field is a distributed force-aware contact representation model. Our results highlight the superior performance of ViTaM-D in both rigid and deformable object reconstruction.
arXiv Detail & Related papers (2024-11-14T16:29:45Z)
Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues. Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z)
Gaze-guided Hand-Object Interaction Synthesis: Dataset and Method [61.19028558470065]
We present GazeHOI, the first dataset to capture simultaneous 3D modeling of gaze, hand, and object interactions.<n>To tackle these issues, we propose a stacked gaze-guided hand-object interaction diffusion model, named GHO-Diffusion.<n>We also introduce HOI-Manifold Guidance during the sampling stage of GHO-Diffusion, enabling fine-grained control over generated motions.
arXiv Detail & Related papers (2024-03-24T14:24:13Z)
Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available. It intricately captures whole-body human motions and part-level object dynamics. We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z)
Hand-Centric Motion Refinement for 3D Hand-Object Interaction via Hierarchical Spatial-Temporal Modeling [18.128376292350836]
We propose a data-driven method for coarse hand motion refinement. First, we design a hand-centric representation to describe the dynamic spatial-temporal relation between hands and objects. Second, to capture the dynamic clues of hand-object interaction, we propose a new architecture.
arXiv Detail & Related papers (2024-01-29T09:17:51Z)
Controllable Human-Object Interaction Synthesis [77.56877961681462]
We propose Controllable Human-Object Interaction Synthesis (CHOIS) to generate synchronized object motion and human motion in 3D scenes. Here, language descriptions inform style and intent, and waypoints, which can be effectively extracted from high-level planning, ground the motion in the scene. Our module seamlessly integrates with a path planning module, enabling the generation of long-term interactions in 3D environments.
arXiv Detail & Related papers (2023-12-06T21:14:20Z)
CG-HOI: Contact-Guided 3D Human-Object Interaction Generation [29.3564427724612]
We propose CG-HOI, the first method to generate dynamic 3D human-object interactions (HOIs) from text. We model the motion of both human and object in an interdependent fashion, as semantically rich human motion rarely happens in isolation. We show that our joint contact-based human-object interaction approach generates realistic and physically plausible sequences.
arXiv Detail & Related papers (2023-11-27T18:59:10Z)
NeuralFusion: Neural Volumetric Rendering under Human-object Interactions [46.70371238621842]
We propose a neural approach for volumetric human-object capture and rendering using sparse consumer RGBD sensors. For geometry modeling, we propose a neural implicit inference scheme with non-rigid key-volume fusion. We also introduce a layer-wise human-object texture rendering scheme, which combines volumetric and image-based rendering in both spatial and temporal domains.
arXiv Detail & Related papers (2022-02-25T17:10:07Z)
RobustFusion: Robust Volumetric Performance Reconstruction under Human-object Interactions from Monocular RGBD Stream [27.600873320989276]
High-quality 4D reconstruction of human performance with complex interactions to various objects is essential in real-world scenarios. Recent advances still fail to provide reliable performance reconstruction. We propose RobustFusion, a robust volumetric performance reconstruction system for human-object interaction scenarios.
arXiv Detail & Related papers (2021-04-30T08:41:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.