Related papers: Joint Hand-object 3D Reconstruction from a Single Image with Cross-branch Feature Fusion

Joint Hand-object 3D Reconstruction from a Single Image with Cross-branch Feature Fusion

URL: http://arxiv.org/abs/2006.15561v2
Date: Mon, 22 Mar 2021 07:38:29 GMT
Title: Joint Hand-object 3D Reconstruction from a Single Image with Cross-branch Feature Fusion
Authors: Yujin Chen, Zhigang Tu, Di Kang, Ruizhi Chen, Linchao Bao, Zhengyou Zhang, Junsong Yuan
Abstract summary: We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches. We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map. Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
Score: 78.98074380040838
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate 3D reconstruction of the hand and object shape from a hand-object image is important for understanding human-object interaction as well as human daily activities. Different from bare hand pose estimation, hand-object interaction poses a strong constraint on both the hand and its manipulated object, which suggests that hand configuration may be crucial contextual information for the object, and vice versa. However, current approaches address this task by training a two-branch network to reconstruct the hand and object separately with little communication between the two branches. In this work, we propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches. We extensively investigate cross-branch feature fusion architectures with MLP or LSTM units. Among the investigated architectures, a variant with LSTM units that enhances object feature with hand feature shows the best performance gain. Moreover, we employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map, which further improves the reconstruction accuracy. Experiments conducted on public datasets demonstrate that our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.

Related papers

HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation [29.766317710266765]
We propose a new 3D Gaussian Splatting based data augmentation framework for bimanual hand-object interaction. We use mesh-based 3DGS to model objects and hands, and to deal with the rendering blur problem due to multi-resolution input images used. We extend the single hand grasping pose optimization module for the bimanual hand object to generate various poses of bimanual hand-object interaction.
arXiv Detail & Related papers (2025-01-06T08:48:17Z)
EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild [79.71523320368388]
Our work aims to reconstruct hand-object interactions from a single-view image. We first design a novel pipeline to estimate the underlying hand pose and object shape. With the initial reconstruction, we employ a prior-guided optimization scheme.
arXiv Detail & Related papers (2024-11-21T16:33:35Z)
HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video [70.11702620562889]
HOLD -- the first category-agnostic method that reconstructs an articulated hand and object jointly from a monocular interaction video. We develop a compositional articulated implicit model that can disentangled 3D hand and object from 2D images. Our method does not rely on 3D hand-object annotations while outperforming fully-supervised baselines in both in-the-lab and challenging in-the-wild settings.
arXiv Detail & Related papers (2023-11-30T10:50:35Z)
SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction [13.417086460511696]
We introduce the SHOWMe dataset which consists of 96 videos, annotated with real and detailed hand-object 3D textured meshes. We consider a rigid hand-object scenario, in which the pose of the hand with respect to the object remains constant during the whole video sequence. This assumption allows us to register sub-millimetre-precise groundtruth 3D scans to the image sequences in SHOWMe.
arXiv Detail & Related papers (2023-09-19T16:48:29Z)
Decoupled Iterative Refinement Framework for Interacting Hands Reconstruction from a Single RGB Image [30.24438569170251]
We propose a decoupled iterative refinement framework to achieve pixel-alignment hand reconstruction. Our method outperforms all existing two-hand reconstruction methods by a large margin on the InterHand2.6M dataset.
arXiv Detail & Related papers (2023-02-05T15:46:57Z)
HMDO: Markerless Multi-view Hand Manipulation Capture with Deformable Objects [8.711239906965893]
HMDO is the first markerless deformable interaction dataset recording interactive motions of the hands and deformable objects. The proposed method can reconstruct interactive motions of hands and deformable objects with high quality.
arXiv Detail & Related papers (2023-01-18T16:55:15Z)
Collaborative Learning for Hand and Object Reconstruction with Attention-guided Graph Convolution [49.10497573378427]
Estimating the pose and shape of hands and objects under interaction finds numerous applications including augmented and virtual reality. Our algorithm is optimisation to object models, and it learns the physical rules governing hand-object interaction. Experiments using four widely-used benchmarks show that our framework achieves beyond state-of-the-art accuracy in 3D pose estimation, as well as recovers dense 3D hand and object shapes.
arXiv Detail & Related papers (2022-04-27T17:00:54Z)
What's in your hands? 3D Reconstruction of Generic Objects in Hands [49.12461675219253]
Our work aims to reconstruct hand-held objects given a single RGB image. In contrast to prior works that typically assume known 3D templates and reduce the problem to 3D pose estimation, our work reconstructs generic hand-held object without knowing their 3D templates.
arXiv Detail & Related papers (2022-04-14T17:59:02Z)
Leveraging Photometric Consistency over Time for Sparsely Supervised Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video. Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses. We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.