Towards Realistic Hand-Object Interaction with Gravity-Field Based Diffusion Bridge
- URL: http://arxiv.org/abs/2509.03114v1
- Date: Wed, 03 Sep 2025 08:15:19 GMT
- Title: Towards Realistic Hand-Object Interaction with Gravity-Field Based Diffusion Bridge
- Authors: Miao Xu, Xiangyu Zhu, Xusheng Liang, Zidu Wang, Jinlin Wu, Zhen Lei,
- Abstract summary: We formulate hand-object interaction as an attraction-driven process and propose a Gravity-Field Based Diffusion Bridge (GravityDB) to simulate interactions between a deformable hand surface and rigid objects.<n>Our approach effectively resolves the aforementioned issues by generating physically plausible interactions that are free of interpenetration, ensure stable grasping, and capture realistic hand deformations.
- Score: 23.484478357333856
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing reconstruction or hand-object pose estimation methods are capable of producing coarse interaction states. However, due to the complex and diverse geometry of both human hands and objects, these approaches often suffer from interpenetration or leave noticeable gaps in regions that are supposed to be in contact. Moreover, the surface of a real human hand undergoes non-negligible deformations during interaction, which are difficult to capture and represent with previous methods. To tackle these challenges, we formulate hand-object interaction as an attraction-driven process and propose a Gravity-Field Based Diffusion Bridge (GravityDB) to simulate interactions between a deformable hand surface and rigid objects. Our approach effectively resolves the aforementioned issues by generating physically plausible interactions that are free of interpenetration, ensure stable grasping, and capture realistic hand deformations. Furthermore, we incorporate semantic information from textual descriptions to guide the construction of the gravitational field, enabling more semantically meaningful interaction regions. Extensive qualitative and quantitative experiments on multiple datasets demonstrate the effectiveness of our method.
Related papers
- Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models [80.28579390566298]
We introduce Interact2Ar, a text-conditioned autoregressive diffusion model for generating full-body, human-human interactions.<n>Hand kinematics are incorporated through dedicated parallel branches, enabling high-fidelity full-body generation.<n>Our model enables a series of downstream applications, including temporal motion composition, real-time adaptation to disturbances, and extension beyond dyadic to multi-person scenarios.
arXiv Detail & Related papers (2025-12-22T18:59:50Z) - Interaction-Aware 4D Gaussian Splatting for Dynamic Hand-Object Interaction Reconstruction [19.178735596766476]
This paper focuses on a challenging setting of simultaneously modeling geometry and appearance of hand-object interaction scenes without any object priors.<n>We present interaction-aware hand-object Gaussians with newly introduced optimizable parameters aiming to adopt piecewise linear hypothesis for clearer structural representation.<n> Experiments show that our approach surpasses existing dynamic 3D-GS-based methods and achieves state-of-the-art performance in reconstructing dynamic hand-object interaction.
arXiv Detail & Related papers (2025-11-18T14:44:04Z) - MoReact: Generating Reactive Motion from Textual Descriptions [57.642436102978245]
MoReact is a diffusion-based method designed to disentangle the generation of global trajectories and local motions sequentially.<n>Our experiments, utilizing data adapted from a two-person motion dataset, demonstrate the efficacy of our approach.
arXiv Detail & Related papers (2025-09-28T14:31:41Z) - Learning to Align and Refine: A Foundation-to-Diffusion Framework for Occlusion-Robust Two-Hand Reconstruction [50.952228546326516]
Two-hand reconstruction from monocular images faces persistent challenges due to complex and dynamic hand postures.<n>Existing approaches struggle with such alignment issues, often resulting in misalignment and penetration artifacts.<n>We propose a dual-stage Foundation-to-Diffusion framework that precisely align 2D prior guidance from vision foundation models.
arXiv Detail & Related papers (2025-03-22T14:42:27Z) - OOD-HOI: Text-Driven 3D Whole-Body Human-Object Interactions Generation Beyond Training Domains [66.62502882481373]
Current methods tend to focus either on the body or the hands, which limits their ability to produce cohesive and realistic interactions.<n>We propose OOD-HOI, a text-driven framework for generating whole-body human-object interactions that generalize well to new objects and actions.<n>Our approach integrates a dual-branch reciprocal diffusion model to synthesize initial interaction poses, a contact-guided interaction refiner to improve physical accuracy based on predicted contact areas, and a dynamic adaptation mechanism which includes semantic adjustment and geometry deformation to improve robustness.
arXiv Detail & Related papers (2024-11-27T10:13:35Z) - Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation [47.940270914254285]
ViTaM-D is a visual-tactile framework for reconstructing dynamic hand-object interaction with distributed tactile sensing.<n> DF-Field is a force-aware contact representation leveraging kinetic and potential energy in hand-object interactions.<n>ViTaM-D outperforms state-of-the-art methods in reconstruction accuracy for both rigid and deformable objects.
arXiv Detail & Related papers (2024-11-14T16:29:45Z) - Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - ManiDext: Hand-Object Manipulation Synthesis via Continuous Correspondence Embeddings and Residual-Guided Diffusion [36.9457697304841]
ManiDext is a unified hierarchical diffusion-based framework for generating hand manipulation and grasp poses.
Our key insight is that accurately modeling the contact correspondences between objects and hands during interactions is crucial.
Our framework first generates contact maps and correspondence embeddings on the object's surface.
Based on these fine-grained correspondences, we introduce a novel approach that integrates the iterative refinement process into the diffusion process.
arXiv Detail & Related papers (2024-09-14T04:28:44Z) - DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions [15.417836855005087]
We propose a novel method, dubbed DiffH2O, which can synthesize realistic, one or two-handed object interactions.<n>The method introduces three techniques that enable effective learning from limited data.
arXiv Detail & Related papers (2024-03-26T16:06:42Z) - Gaze-guided Hand-Object Interaction Synthesis: Dataset and Method [61.19028558470065]
We present GazeHOI, the first dataset to capture simultaneous 3D modeling of gaze, hand, and object interactions.<n>To tackle these issues, we propose a stacked gaze-guided hand-object interaction diffusion model, named GHO-Diffusion.<n>We also introduce HOI-Manifold Guidance during the sampling stage of GHO-Diffusion, enabling fine-grained control over generated motions.
arXiv Detail & Related papers (2024-03-24T14:24:13Z) - THOR: Text to Human-Object Interaction Diffusion via Relation Intervention [51.02435289160616]
We propose a novel Text-guided Human-Object Interaction diffusion model with Relation Intervention (THOR)
In each diffusion step, we initiate text-guided human and object motion and then leverage human-object relations to intervene in object motion.
We construct Text-BEHAVE, a Text2HOI dataset that seamlessly integrates textual descriptions with the currently largest publicly available 3D HOI dataset.
arXiv Detail & Related papers (2024-03-17T13:17:25Z) - Pairwise-Constrained Implicit Functions for 3D Human Heart Modelling [60.56741715207466]
We introduce a pairwise-constrained SDF approach that models the heart as a set of interdependent SDFs.<n>Our method significantly improves inner structure accuracy over single-SDF, UDF-based, voxel-based, and segmentation-based reconstructions.
arXiv Detail & Related papers (2023-07-16T10:07:15Z) - HMDO: Markerless Multi-view Hand Manipulation Capture with Deformable
Objects [8.711239906965893]
HMDO is the first markerless deformable interaction dataset recording interactive motions of the hands and deformable objects.
The proposed method can reconstruct interactive motions of hands and deformable objects with high quality.
arXiv Detail & Related papers (2023-01-18T16:55:15Z) - Physical Interaction: Reconstructing Hand-object Interactions with
Physics [17.90852804328213]
The paper proposes a physics-based method to better solve the ambiguities in the reconstruction.
It first proposes a force-based dynamic model of the in-hand object, which recovers the unobserved contacts and also solves for plausible contact forces.
Experiments show that the proposed technique reconstructs both physically plausible and more accurate hand-object interaction.
arXiv Detail & Related papers (2022-09-22T07:41:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.