What's in your hands? 3D Reconstruction of Generic Objects in Hands
- URL: http://arxiv.org/abs/2204.07153v1
- Date: Thu, 14 Apr 2022 17:59:02 GMT
- Title: What's in your hands? 3D Reconstruction of Generic Objects in Hands
- Authors: Yufei Ye, Abhinav Gupta, Shubham Tulsiani
- Abstract summary: Our work aims to reconstruct hand-held objects given a single RGB image.
In contrast to prior works that typically assume known 3D templates and reduce the problem to 3D pose estimation, our work reconstructs generic hand-held object without knowing their 3D templates.
- Score: 49.12461675219253
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Our work aims to reconstruct hand-held objects given a single RGB image. In
contrast to prior works that typically assume known 3D templates and reduce the
problem to 3D pose estimation, our work reconstructs generic hand-held object
without knowing their 3D templates. Our key insight is that hand articulation
is highly predictive of the object shape, and we propose an approach that
conditionally reconstructs the object based on the articulation and the visual
input. Given an image depicting a hand-held object, we first use off-the-shelf
systems to estimate the underlying hand pose and then infer the object shape in
a normalized hand-centric coordinate frame. We parameterized the object by
signed distance which are inferred by an implicit network which leverages the
information from both visual feature and articulation-aware coordinates to
process a query point. We perform experiments across three datasets and show
that our method consistently outperforms baselines and is able to reconstruct a
diverse set of objects. We analyze the benefits and robustness of explicit
articulation conditioning and also show that this allows the hand pose
estimation to further improve in test-time optimization.
Related papers
- EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild [79.71523320368388]
Our work aims to reconstruct hand-object interactions from a single-view image.
We first design a novel pipeline to estimate the underlying hand pose and object shape.
With the initial reconstruction, we employ a prior-guided optimization scheme.
arXiv Detail & Related papers (2024-11-21T16:33:35Z) - Reconstructing Hand-Held Objects in 3D from Images and Videos [53.277402172488735]
Given a monocular RGB video, we aim to reconstruct hand-held object geometry in 3D, over time.
We present MCC-Hand-Object (MCC-HO), which jointly reconstructs hand and object geometry given a single RGB image.
We then prompt a text-to-3D generative model using GPT-4(V) to retrieve a 3D object model that matches the object in the image.
arXiv Detail & Related papers (2024-04-09T17:55:41Z) - HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and
Objects from Video [70.11702620562889]
HOLD -- the first category-agnostic method that reconstructs an articulated hand and object jointly from a monocular interaction video.
We develop a compositional articulated implicit model that can disentangled 3D hand and object from 2D images.
Our method does not rely on 3D hand-object annotations while outperforming fully-supervised baselines in both in-the-lab and challenging in-the-wild settings.
arXiv Detail & Related papers (2023-11-30T10:50:35Z) - ShapeGraFormer: GraFormer-Based Network for Hand-Object Reconstruction from a Single Depth Map [11.874184782686532]
We propose the first approach for realistic 3D hand-object shape and pose reconstruction from a single depth map.
Our pipeline additionally predicts voxelized hand-object shapes, having a one-to-one mapping to the input voxelized depth.
In addition, we show the impact of adding another GraFormer component that refines the reconstructed shapes based on the hand-object interactions.
arXiv Detail & Related papers (2023-10-18T09:05:57Z) - SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction [13.417086460511696]
We introduce the SHOWMe dataset which consists of 96 videos, annotated with real and detailed hand-object 3D textured meshes.
We consider a rigid hand-object scenario, in which the pose of the hand with respect to the object remains constant during the whole video sequence.
This assumption allows us to register sub-millimetre-precise groundtruth 3D scans to the image sequences in SHOWMe.
arXiv Detail & Related papers (2023-09-19T16:48:29Z) - Learning Explicit Contact for Implicit Reconstruction of Hand-held
Objects from Monocular Images [59.49985837246644]
We show how to model contacts in an explicit way to benefit the implicit reconstruction of hand-held objects.
In the first part, we propose a new subtask of directly estimating 3D hand-object contacts from a single image.
In the second part, we introduce a novel method to diffuse estimated contact states from the hand mesh surface to nearby 3D space.
arXiv Detail & Related papers (2023-05-31T17:59:26Z) - Object Wake-up: 3-D Object Reconstruction, Animation, and in-situ
Rendering from a Single Image [58.69732754597448]
Given a picture of a chair, could we extract the 3-D shape of the chair, animate its plausible articulations and motions, and render in-situ in its original image space?
We devise an automated approach to extract and manipulate articulated objects in single images.
arXiv Detail & Related papers (2021-08-05T16:20:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.