Introducing HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking
- URL: http://arxiv.org/abs/2406.09598v1
- Date: Thu, 13 Jun 2024 21:38:17 GMT
- Title: Introducing HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking
- Authors: Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Fan Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, Jakob Julian Engel, Tomas Hodan,
- Abstract summary: We introduce HOT3D, a dataset for egocentric hand and object tracking in 3D.
The dataset offers over 833 minutes of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects.
In addition to simple pick-up/observe/put-down actions, HOT3D contains scenarios resembling typical actions in a kitchen, office, and living room environment.
- Score: 7.443420525809604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground truth annotations including 3D poses of objects, hands, and cameras, and 3D models of hands and objects. In addition to simple pick-up/observe/put-down actions, HOT3D contains scenarios resembling typical actions in a kitchen, office, and living room environment. The dataset is recorded by two head-mounted devices from Meta: Project Aria, a research prototype of light-weight AR/AI glasses, and Quest 3, a production VR headset sold in millions of units. Ground-truth poses were obtained by a professional motion-capture system using small optical markers attached to hands and objects. Hand annotations are provided in the UmeTrack and MANO formats and objects are represented by 3D meshes with PBR materials obtained by an in-house scanner. We aim to accelerate research on egocentric hand-object interaction by making the HOT3D dataset publicly available and by co-organizing public challenges on the dataset at ECCV 2024. The dataset can be downloaded from the project website: https://facebookresearch.github.io/hot3d/.
Related papers
- HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction [16.363878619678367]
We introduce a new dataset named HO-Cap that can be used to study 3D reconstruction and pose tracking of hands and objects in videos.
We propose a semi-automatic method to obtain annotations of shape and pose of hands and objects in the collected videos.
Our data capture setup and annotation framework can be used by the community to reconstruct 3D shapes of objects and human hands and track their poses in videos.
arXiv Detail & Related papers (2024-06-10T23:25:19Z) - Delving into Motion-Aware Matching for Monocular 3D Object Tracking [81.68608983602581]
We find that the motion cue of objects along different time frames is critical in 3D multi-object tracking.
We propose MoMA-M3T, a framework that mainly consists of three motion-aware components.
We conduct extensive experiments on the nuScenes and KITTI datasets to demonstrate our MoMA-M3T achieves competitive performance against state-of-the-art methods.
arXiv Detail & Related papers (2023-08-22T17:53:58Z) - Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects.
We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z) - HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose
Annotations, Affordances, and Reconstructions [17.9178233068395]
We present the HANDAL dataset for category-level object pose estimation and affordance prediction.
The dataset consists of 308k annotated image frames from 2.2k videos of 212 real-world objects in 17 categories.
We outline the usefulness of our dataset for 6-DoF category-level pose+scale estimation and related tasks.
arXiv Detail & Related papers (2023-08-02T23:59:59Z) - Tracking Objects with 3D Representation from Videos [57.641129788552675]
We propose a new 2D Multiple Object Tracking paradigm, called P3DTrack.
With 3D object representation learning from Pseudo 3D object labels in monocular videos, we propose a new 2D MOT paradigm, called P3DTrack.
arXiv Detail & Related papers (2023-06-08T17:58:45Z) - 3D Reconstruction of Objects in Hands without Real World 3D Supervision [12.70221786947807]
We propose modules to leverage 3D supervision to scale up the learning of models for reconstructing hand-held objects.
Specifically, we extract multiview 2D mask supervision from videos and 3D shape priors from shape collections.
We use these indirect 3D cues to train occupancy networks that predict the 3D shape of objects from a single RGB image.
arXiv Detail & Related papers (2023-05-04T17:56:48Z) - ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation [68.80339307258835]
ARCTIC is a dataset of two hands that dexterously manipulate objects.
It contains 2.1M video frames paired with accurate 3D hand meshes and detailed, dynamic contact information.
arXiv Detail & Related papers (2022-04-28T17:23:59Z) - D3D-HOI: Dynamic 3D Human-Object Interactions from Videos [49.38319295373466]
We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions.
Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints.
We leverage the estimated 3D human pose for more accurate inference of the object spatial layout and dynamics.
arXiv Detail & Related papers (2021-08-19T00:49:01Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild
with Pose Annotations [0.0]
We introduce the Objectron dataset to advance the state of the art in 3D object detection.
The dataset contains object-centric short videos with pose annotations for nine categories and includes 4 million annotated images in 14,819 annotated videos.
arXiv Detail & Related papers (2020-12-18T00:34:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.