H2O: Two Hands Manipulating Objects for First Person Interaction
Recognition
- URL: http://arxiv.org/abs/2104.11181v1
- Date: Thu, 22 Apr 2021 17:10:42 GMT
- Title: H2O: Two Hands Manipulating Objects for First Person Interaction
Recognition
- Authors: Taein Kwon, Bugra Tekin, Jan Stuhmer, Federica Bogo, Marc Pollefeys
- Abstract summary: We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects.
Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame.
Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds.
- Score: 70.46638409156772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present, for the first time, a comprehensive framework for egocentric
interaction recognition using markerless 3D annotations of two hands
manipulating objects. To this end, we propose a method to create a unified
dataset for egocentric 3D interaction recognition. Our method produces
annotations of the 3D pose of two hands and the 6D pose of the manipulated
objects, along with their interaction labels for each frame. Our dataset,
called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D
images, interaction labels, object classes, ground-truth 3D poses for left &
right hands, 6D object poses, ground-truth camera poses, object meshes and
scene point clouds. To the best of our knowledge, this is the first benchmark
that enables the study of first-person actions with the use of the pose of both
left and right hands manipulating objects and presents an unprecedented level
of detail for egocentric 3D interaction recognition. We further propose the
first method to predict interaction classes by estimating the 3D pose of two
hands and the 6D pose of the manipulated objects, jointly from RGB images. Our
method models both inter- and intra-dependencies between both hands and objects
by learning the topology of a graph convolutional network that predicts
interactions. We show that our method facilitated by this dataset establishes a
strong baseline for joint hand-object pose estimation and achieves
state-of-the-art accuracy for first person interaction recognition.
Related papers
- HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and
Objects from Video [70.11702620562889]
HOLD -- the first category-agnostic method that reconstructs an articulated hand and object jointly from a monocular interaction video.
We develop a compositional articulated implicit model that can disentangled 3D hand and object from 2D images.
Our method does not rely on 3D hand-object annotations while outperforming fully-supervised baselines in both in-the-lab and challenging in-the-wild settings.
arXiv Detail & Related papers (2023-11-30T10:50:35Z) - AffordPose: A Large-scale Dataset of Hand-Object Interactions with
Affordance-driven Hand Pose [16.65196181081623]
We present AffordPose, a large-scale dataset of hand-object interactions with affordance-driven hand pose.
We collect a total of 26.7K hand-object interactions, each including the 3D object shape, the part-level affordance label, and the manually adjusted hand poses.
The comprehensive data analysis shows the common characteristics and diversity of hand-object interactions per affordance.
arXiv Detail & Related papers (2023-09-16T10:25:28Z) - ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation [68.80339307258835]
ARCTIC is a dataset of two hands that dexterously manipulate objects.
It contains 2.1M video frames paired with accurate 3D hand meshes and detailed, dynamic contact information.
arXiv Detail & Related papers (2022-04-28T17:23:59Z) - What's in your hands? 3D Reconstruction of Generic Objects in Hands [49.12461675219253]
Our work aims to reconstruct hand-held objects given a single RGB image.
In contrast to prior works that typically assume known 3D templates and reduce the problem to 3D pose estimation, our work reconstructs generic hand-held object without knowing their 3D templates.
arXiv Detail & Related papers (2022-04-14T17:59:02Z) - Learning to Disambiguate Strongly Interacting Hands via Probabilistic
Per-pixel Part Segmentation [84.28064034301445]
Self-similarity, and the resulting ambiguities in assigning pixel observations to the respective hands, is a major cause of the final 3D pose error.
We propose DIGIT, a novel method for estimating the 3D poses of two interacting hands from a single monocular image.
We experimentally show that the proposed approach achieves new state-of-the-art performance on the InterHand2.6M dataset.
arXiv Detail & Related papers (2021-07-01T13:28:02Z) - RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB
Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera.
In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN.
We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z) - Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in
Time [22.574069344246052]
We propose a unified framework for estimating the 3D hand and object poses with semi-supervised learning.
We build a joint learning framework where we perform explicit contextual reasoning between hand and object representations by a Transformer.
Our method not only improves hand pose estimation in challenging real-world dataset, but also substantially improve the object pose which has fewer ground-truths per instance.
arXiv Detail & Related papers (2021-06-09T17:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.