Reconstructing Hand-Object Interactions in the Wild
- URL: http://arxiv.org/abs/2012.09856v1
- Date: Thu, 17 Dec 2020 18:59:58 GMT
- Title: Reconstructing Hand-Object Interactions in the Wild
- Authors: Zhe Cao, Ilija Radosavovic, Angjoo Kanazawa, Jitendra Malik
- Abstract summary: We propose an optimization-based procedure which does not require direct 3D supervision.
We exploit all available related data (2D bounding boxes, 2D hand keypoints, 2D instance masks, 3D object models, 3D in-the-lab MoCap) to provide constraints for the 3D reconstruction.
Our method produces compelling reconstructions on the challenging in-the-wild data from the EPIC Kitchens and the 100 Days of Hands datasets.
- Score: 71.16013096764046
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we explore reconstructing hand-object interactions in the wild.
The core challenge of this problem is the lack of appropriate 3D labeled data.
To overcome this issue, we propose an optimization-based procedure which does
not require direct 3D supervision. The general strategy we adopt is to exploit
all available related data (2D bounding boxes, 2D hand keypoints, 2D instance
masks, 3D object models, 3D in-the-lab MoCap) to provide constraints for the 3D
reconstruction. Rather than optimizing the hand and object individually, we
optimize them jointly which allows us to impose additional constraints based on
hand-object contact, collision, and occlusion. Our method produces compelling
reconstructions on the challenging in-the-wild data from the EPIC Kitchens and
the 100 Days of Hands datasets, across a range of object categories.
Quantitatively, we demonstrate that our approach compares favorably to existing
approaches in the lab settings where ground truth 3D annotations are available.
Related papers
- 3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement [6.858859328420893]
This work identifies and addresses a gap in the current state of the art in 3D Human Pose Estimation (HPE)
We introduce our novel BlendMimic3D dataset, designed to mimic real-world situations where occlusions occur.
We also propose a 3D pose refinement block, employing a Graph Convolutional Network (GCN) to enhance pose representation through a graph model.
arXiv Detail & Related papers (2024-04-24T18:49:37Z) - HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and
Objects from Video [70.11702620562889]
HOLD -- the first category-agnostic method that reconstructs an articulated hand and object jointly from a monocular interaction video.
We develop a compositional articulated implicit model that can disentangled 3D hand and object from 2D images.
Our method does not rely on 3D hand-object annotations while outperforming fully-supervised baselines in both in-the-lab and challenging in-the-wild settings.
arXiv Detail & Related papers (2023-11-30T10:50:35Z) - SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction [13.417086460511696]
We introduce the SHOWMe dataset which consists of 96 videos, annotated with real and detailed hand-object 3D textured meshes.
We consider a rigid hand-object scenario, in which the pose of the hand with respect to the object remains constant during the whole video sequence.
This assumption allows us to register sub-millimetre-precise groundtruth 3D scans to the image sequences in SHOWMe.
arXiv Detail & Related papers (2023-09-19T16:48:29Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to
the Third Dimension [71.71234436165255]
We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only.
Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species.
We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
arXiv Detail & Related papers (2021-08-31T18:33:55Z) - D3D-HOI: Dynamic 3D Human-Object Interactions from Videos [49.38319295373466]
We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions.
Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints.
We leverage the estimated 3D human pose for more accurate inference of the object spatial layout and dynamics.
arXiv Detail & Related papers (2021-08-19T00:49:01Z) - Towards unconstrained joint hand-object reconstruction from RGB videos [81.97694449736414]
Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations.
We first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions.
arXiv Detail & Related papers (2021-08-16T12:26:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.