HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and
Objects from Video
- URL: http://arxiv.org/abs/2311.18448v1
- Date: Thu, 30 Nov 2023 10:50:35 GMT
- Title: HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and
Objects from Video
- Authors: Zicong Fan, Maria Parelli, Maria Eleni Kadoglou, Muhammed Kocabas, Xu
Chen, Michael J. Black, Otmar Hilliges
- Abstract summary: HOLD -- the first category-agnostic method that reconstructs an articulated hand and object jointly from a monocular interaction video.
We develop a compositional articulated implicit model that can disentangled 3D hand and object from 2D images.
Our method does not rely on 3D hand-object annotations while outperforming fully-supervised baselines in both in-the-lab and challenging in-the-wild settings.
- Score: 70.11702620562889
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since humans interact with diverse objects every day, the holistic 3D capture
of these interactions is important to understand and model human behaviour.
However, most existing methods for hand-object reconstruction from RGB either
assume pre-scanned object templates or heavily rely on limited 3D hand-object
data, restricting their ability to scale and generalize to more unconstrained
interaction settings. To this end, we introduce HOLD -- the first
category-agnostic method that reconstructs an articulated hand and object
jointly from a monocular interaction video. We develop a compositional
articulated implicit model that can reconstruct disentangled 3D hand and object
from 2D images. We also further incorporate hand-object constraints to improve
hand-object poses and consequently the reconstruction quality. Our method does
not rely on 3D hand-object annotations while outperforming fully-supervised
baselines in both in-the-lab and challenging in-the-wild settings. Moreover, we
qualitatively show its robustness in reconstructing from in-the-wild videos.
Code: https://github.com/zc-alexfan/hold
Related papers
- EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild [79.71523320368388]
Our work aims to reconstruct hand-object interactions from a single-view image.
We first design a novel pipeline to estimate the underlying hand pose and object shape.
With the initial reconstruction, we employ a prior-guided optimization scheme.
arXiv Detail & Related papers (2024-11-21T16:33:35Z) - Reconstructing Hand-Held Objects in 3D from Images and Videos [53.277402172488735]
Given a monocular RGB video, we aim to reconstruct hand-held object geometry in 3D, over time.
We present MCC-Hand-Object (MCC-HO), which jointly reconstructs hand and object geometry given a single RGB image.
We then prompt a text-to-3D generative model using GPT-4(V) to retrieve a 3D object model that matches the object in the image.
arXiv Detail & Related papers (2024-04-09T17:55:41Z) - SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction [13.417086460511696]
We introduce the SHOWMe dataset which consists of 96 videos, annotated with real and detailed hand-object 3D textured meshes.
We consider a rigid hand-object scenario, in which the pose of the hand with respect to the object remains constant during the whole video sequence.
This assumption allows us to register sub-millimetre-precise groundtruth 3D scans to the image sequences in SHOWMe.
arXiv Detail & Related papers (2023-09-19T16:48:29Z) - HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB Image [41.580285338167315]
This paper presents a method to learn hand-object interaction prior for reconstructing a 3D hand-object scene from a single RGB image.
We use the hand shape to constrain the possible relative configuration of the hand and object geometry.
We show that HandNeRF is able to reconstruct hand-object scenes of novel grasp configurations more accurately than comparable methods.
arXiv Detail & Related papers (2023-09-14T17:42:08Z) - What's in your hands? 3D Reconstruction of Generic Objects in Hands [49.12461675219253]
Our work aims to reconstruct hand-held objects given a single RGB image.
In contrast to prior works that typically assume known 3D templates and reduce the problem to 3D pose estimation, our work reconstructs generic hand-held object without knowing their 3D templates.
arXiv Detail & Related papers (2022-04-14T17:59:02Z) - D3D-HOI: Dynamic 3D Human-Object Interactions from Videos [49.38319295373466]
We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions.
Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints.
We leverage the estimated 3D human pose for more accurate inference of the object spatial layout and dynamics.
arXiv Detail & Related papers (2021-08-19T00:49:01Z) - Towards unconstrained joint hand-object reconstruction from RGB videos [81.97694449736414]
Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations.
We first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions.
arXiv Detail & Related papers (2021-08-16T12:26:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.