Related papers: Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation

Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation

URL: http://arxiv.org/abs/2411.09572v2
Date: Wed, 09 Jul 2025 07:53:30 GMT
Title: Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation
Authors: Zhenjun Yu, Wenqiang Xu, Pengfei Xie, Yutong Li, Brian W. Anthony, Zhuorui Zhang, Cewu Lu,
Abstract summary: ViTaM-D is a visual-tactile framework for reconstructing dynamic hand-object interaction with distributed tactile sensing.<n> DF-Field is a force-aware contact representation leveraging kinetic and potential energy in hand-object interactions.<n>ViTaM-D outperforms state-of-the-art methods in reconstruction accuracy for both rigid and deformable objects.
Score: 47.940270914254285
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present ViTaM-D, a novel visual-tactile framework for reconstructing dynamic hand-object interaction with distributed tactile sensing to enhance contact modeling. Existing methods, relying solely on visual inputs, often fail to capture occluded interactions and object deformation. To address this, we introduce DF-Field, a distributed force-aware contact representation leveraging kinetic and potential energy in hand-object interactions. ViTaM-D first reconstructs interactions using a visual network with contact constraint, then refines contact details through force-aware optimization, improving object deformation modeling. To evaluate deformable object reconstruction, we introduce the HOT dataset, featuring 600 hand-object interaction sequences in a high-precision simulation environment. Experiments on DexYCB and HOT datasets show that ViTaM-D outperforms state-of-the-art methods in reconstruction accuracy for both rigid and deformable objects. DF-Field also proves more effective in refining hand poses and enhancing contact modeling than previous refinement methods. The code, models, and datasets are available at https://sites.google.com/view/vitam-d/.

Related papers

Guiding Human-Object Interactions with Rich Geometry and Relations [21.528466852204627]
Existing methods often rely on simplified object representations, such as the object's centroid or the nearest point to a human, to achieve physically plausible motions. We introduce ROG, a novel framework that addresses relationships inherent in HOIs with rich geometric detail. We show that ROG significantly outperforms state-of-the-art methods in the realism evaluations and semantic accuracy of synthesized HOIs.
arXiv Detail & Related papers (2025-03-26T02:57:18Z)
Aligning Foundation Model Priors and Diffusion-Based Hand Interactions for Occlusion-Resistant Two-Hand Reconstruction [50.952228546326516]
Two-hand reconstruction from monocular images faces persistent challenges due to complex and dynamic hand postures and occlusions. Existing approaches struggle with such alignment issues, often resulting in misalignment and penetration artifacts. We propose a novel framework that attempts to precisely align hand poses and interactions by integrating foundation model-driven 2D priors with diffusion-based interaction refinement.
arXiv Detail & Related papers (2025-03-22T14:42:27Z)
Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model [72.90370736032115]
We present a novel video Reenactment framework focusing on Human-Object Interaction (HOI) via an adaptive layout-instructed Diffusion model (Re-HOLD) Our key insight is to employ specialized layout representation for hands and objects, respectively. To further improve the generation quality of HOI, we design an interactive textural enhancement module for both hands and objects.
arXiv Detail & Related papers (2025-03-21T08:40:35Z)
ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting [66.29782808719301]
Building articulated objects is a key challenge in computer vision. Existing methods often fail to effectively integrate information across different object states. We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation.
arXiv Detail & Related papers (2025-02-26T10:25:32Z)
HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation [29.766317710266765]
We propose a new 3D Gaussian Splatting based data augmentation framework for bimanual hand-object interaction. We use mesh-based 3DGS to model objects and hands, and to deal with the rendering blur problem due to multi-resolution input images used. We extend the single hand grasping pose optimization module for the bimanual hand object to generate various poses of bimanual hand-object interaction.
arXiv Detail & Related papers (2025-01-06T08:48:17Z)
A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA) CEFA consists of a feature alignment module and a context enhancement module. Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z)
EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone. We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z)
Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available. It intricately captures whole-body human motions and part-level object dynamics. We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z)
NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of Hand-Object Interaction [19.957593804898064]
We present a novel free-point rendering framework, Neural Contact Radiance Field ( NCRF), to reconstruct hand-object interactions from a sparse set of videos. We jointly learn these key components where they mutually help and regularize each other with visual and geometric constraints. Our approach outperforms the current state-of-the-art in terms of both rendering quality and pose estimation accuracy.
arXiv Detail & Related papers (2024-02-08T10:09:12Z)
Integrated Object Deformation and Contact Patch Estimation from Visuo-Tactile Feedback [8.420670642409219]
We propose a representation that jointly models object deformations and contact patches from visuo-tactile feedback. We propose a neural network architecture to learn a NDCF, and train it using simulated data. We demonstrate that the learned NDCF transfers directly to the real-world without the need for fine-tuning.
arXiv Detail & Related papers (2023-05-23T18:53:24Z)
Visual-Tactile Sensing for In-Hand Object Reconstruction [38.42487660352112]
We propose a visual-tactile in-hand object reconstruction framework textbfVTacO, and extend it to textbfVTacOH for hand-object reconstruction. A simulation environment, VT-Sim, supports generating hand-object interaction for both rigid and deformable objects.
arXiv Detail & Related papers (2023-03-25T15:16:31Z)
HMDO: Markerless Multi-view Hand Manipulation Capture with Deformable Objects [8.711239906965893]
HMDO is the first markerless deformable interaction dataset recording interactive motions of the hands and deformable objects. The proposed method can reconstruct interactive motions of hands and deformable objects with high quality.
arXiv Detail & Related papers (2023-01-18T16:55:15Z)
Stability-driven Contact Reconstruction From Monocular Color Images [7.427212296770506]
Physical contact provides additional constraints for hand-object state reconstruction. Existing methods optimize the hand-object contact driven by distance threshold or prior from contact-labeled datasets. Our key idea is to reconstruct the contact pattern directly from monocular images, and then utilize the physical stability criterion in the simulation to optimize it.
arXiv Detail & Related papers (2022-05-02T12:23:06Z)
ACID: Action-Conditional Implicit Visual Dynamics for Deformable Object Manipulation [135.10594078615952]
We introduce ACID, an action-conditional visual dynamics model for volumetric deformable objects. A benchmark contains over 17,000 action trajectories with six types of plush toys and 78 variants. Our model achieves the best performance in geometry, correspondence, and dynamics predictions.
arXiv Detail & Related papers (2022-03-14T04:56:55Z)
Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects. We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model. This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z)
Learning Intuitive Physics with Multimodal Generative Models [24.342994226226786]
This paper presents a perception framework that fuses visual and tactile feedback to make predictions about the expected motion of objects in dynamic scenes. We use a novel See-Through-your-Skin (STS) sensor that provides high resolution multimodal sensing of contact surfaces. We validate through simulated and real-world experiments in which the resting state of an object is predicted from given initial conditions.
arXiv Detail & Related papers (2021-01-12T12:55:53Z)
Joint Hand-object 3D Reconstruction from a Single Image with Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches. We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map. Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.