Learning Explicit Contact for Implicit Reconstruction of Hand-held
Objects from Monocular Images
- URL: http://arxiv.org/abs/2305.20089v2
- Date: Tue, 16 Jan 2024 08:10:46 GMT
- Title: Learning Explicit Contact for Implicit Reconstruction of Hand-held
Objects from Monocular Images
- Authors: Junxing Hu, Hongwen Zhang, Zerui Chen, Mengcheng Li, Yunlong Wang,
Yebin Liu, Zhenan Sun
- Abstract summary: We show how to model contacts in an explicit way to benefit the implicit reconstruction of hand-held objects.
In the first part, we propose a new subtask of directly estimating 3D hand-object contacts from a single image.
In the second part, we introduce a novel method to diffuse estimated contact states from the hand mesh surface to nearby 3D space.
- Score: 59.49985837246644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing hand-held objects from monocular RGB images is an appealing
yet challenging task. In this task, contacts between hands and objects provide
important cues for recovering the 3D geometry of the hand-held objects. Though
recent works have employed implicit functions to achieve impressive progress,
they ignore formulating contacts in their frameworks, which results in
producing less realistic object meshes. In this work, we explore how to model
contacts in an explicit way to benefit the implicit reconstruction of hand-held
objects. Our method consists of two components: explicit contact prediction and
implicit shape reconstruction. In the first part, we propose a new subtask of
directly estimating 3D hand-object contacts from a single image. The part-level
and vertex-level graph-based transformers are cascaded and jointly learned in a
coarse-to-fine manner for more accurate contact probabilities. In the second
part, we introduce a novel method to diffuse estimated contact states from the
hand mesh surface to nearby 3D space and leverage diffused contact
probabilities to construct the implicit neural representation for the
manipulated object. Benefiting from estimating the interaction patterns between
the hand and the object, our method can reconstruct more realistic object
meshes, especially for object parts that are in contact with hands. Extensive
experiments on challenging benchmarks show that the proposed method outperforms
the current state of the arts by a great margin. Our code is publicly available
at https://junxinghu.github.io/projects/hoi.html.
Related papers
- Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction [8.253265795150401]
This paper introduces the first text-guided work for generating the sequence of hand-object interaction in 3D.
For contact generation, a VAE-based network takes as input a text and an object mesh, and generates the probability of contacts between the surfaces of hands and the object.
For motion generation, a Transformer-based diffusion model utilizes this 3D contact map as a strong prior for generating physically plausible hand-object motion.
arXiv Detail & Related papers (2024-03-31T04:56:30Z) - In-Hand 3D Object Reconstruction from a Monocular RGB Video [17.31419675163019]
Our work aims to reconstruct a 3D object that is held and rotated by a hand in front of a static RGB camera.
Previous methods that use implicit neural representations to recover the geometry of a generic hand-held object from multi-view images achieved compelling results in the visible part of the object.
arXiv Detail & Related papers (2023-12-27T06:19:25Z) - ShapeGraFormer: GraFormer-Based Network for Hand-Object Reconstruction from a Single Depth Map [11.874184782686532]
We propose the first approach for realistic 3D hand-object shape and pose reconstruction from a single depth map.
Our pipeline additionally predicts voxelized hand-object shapes, having a one-to-one mapping to the input voxelized depth.
In addition, we show the impact of adding another GraFormer component that refines the reconstructed shapes based on the hand-object interactions.
arXiv Detail & Related papers (2023-10-18T09:05:57Z) - HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB Image [41.580285338167315]
This paper presents a method to learn hand-object interaction prior for reconstructing a 3D hand-object scene from a single RGB image.
We use the hand shape to constrain the possible relative configuration of the hand and object geometry.
We show that HandNeRF is able to reconstruct hand-object scenes of novel grasp configurations more accurately than comparable methods.
arXiv Detail & Related papers (2023-09-14T17:42:08Z) - S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation
with Semi-Supervised Learning [70.72037296392642]
We propose a novel semi-supervised framework that allows us to learn contact from monocular images.
Specifically, we leverage visual and geometric consistency constraints in large-scale datasets for generating pseudo-labels.
We show benefits from using a contact map that rules hand-object interactions to produce more accurate reconstructions.
arXiv Detail & Related papers (2022-08-01T14:05:23Z) - What's in your hands? 3D Reconstruction of Generic Objects in Hands [49.12461675219253]
Our work aims to reconstruct hand-held objects given a single RGB image.
In contrast to prior works that typically assume known 3D templates and reduce the problem to 3D pose estimation, our work reconstructs generic hand-held object without knowing their 3D templates.
arXiv Detail & Related papers (2022-04-14T17:59:02Z) - Monocular 3D Reconstruction of Interacting Hands via Collision-Aware
Factorized Refinements [96.40125818594952]
We make the first attempt to reconstruct 3D interacting hands from monocular single RGB images.
Our method can generate 3D hand meshes with both precise 3D poses and minimal collisions.
arXiv Detail & Related papers (2021-11-01T08:24:10Z) - Towards unconstrained joint hand-object reconstruction from RGB videos [81.97694449736414]
Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations.
We first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions.
arXiv Detail & Related papers (2021-08-16T12:26:34Z) - Joint Hand-object 3D Reconstruction from a Single Image with
Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches.
We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map.
Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.