DexYCB: A Benchmark for Capturing Hand Grasping of Objects
- URL: http://arxiv.org/abs/2104.04631v1
- Date: Fri, 9 Apr 2021 22:54:21 GMT
- Title: DexYCB: A Benchmark for Capturing Hand Grasping of Objects
- Authors: Yu-Wei Chao and Wei Yang and Yu Xiang and Pavlo Molchanov and Ankur
Handa and Jonathan Tremblay and Yashraj S. Narang and Karl Van Wyk and Umar
Iqbal and Stan Birchfield and Jan Kautz and Dieter Fox
- Abstract summary: We introduce DexYCB, a new dataset for capturing hand grasping of objects.
We present a benchmark of state-of-the-art approaches on three relevant tasks.
We then evaluate a new robotics-relevant task: generating safe robot grasps in human-to-robot object handover.
- Score: 101.48808584983867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce DexYCB, a new dataset for capturing hand grasping of objects. We
first compare DexYCB with a related one through cross-dataset evaluation. We
then present a thorough benchmark of state-of-the-art approaches on three
relevant tasks: 2D object and keypoint detection, 6D object pose estimation,
and 3D hand pose estimation. Finally, we evaluate a new robotics-relevant task:
generating safe robot grasps in human-to-robot object handover. Dataset and
code are available at https://dex-ycb.github.io.
Related papers
- Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds [45.87961177297602]
This work aims to integrate recent methods into a comprehensive framework for robotic interaction and manipulation in human-centric environments.
Specifically, we leverage 3D reconstructions from a commodity 3D scanner for open-vocabulary instance segmentation.
We show the performance and robustness of our model in two sets of real-world experiments including dynamic object retrieval and drawer opening.
arXiv Detail & Related papers (2024-04-18T18:01:15Z) - Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset [52.22758311559]
We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot.
The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors.
The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users.
arXiv Detail & Related papers (2024-03-21T14:53:50Z) - SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction [13.417086460511696]
We introduce the SHOWMe dataset which consists of 96 videos, annotated with real and detailed hand-object 3D textured meshes.
We consider a rigid hand-object scenario, in which the pose of the hand with respect to the object remains constant during the whole video sequence.
This assumption allows us to register sub-millimetre-precise groundtruth 3D scans to the image sequences in SHOWMe.
arXiv Detail & Related papers (2023-09-19T16:48:29Z) - Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing.
We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set.
By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z) - HandoverSim: A Simulation Framework and Benchmark for Human-to-Robot
Object Handovers [60.45158007016316]
"HandoverSim" is a simulation benchmark for human-to-robot object handovers.
We leverage a recent motion capture dataset of hand grasping of objects.
We create training and evaluation environments for the receiver with standardized protocols and metrics.
arXiv Detail & Related papers (2022-05-19T17:59:00Z) - ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation [68.80339307258835]
ARCTIC is a dataset of two hands that dexterously manipulate objects.
It contains 2.1M video frames paired with accurate 3D hand meshes and detailed, dynamic contact information.
arXiv Detail & Related papers (2022-04-28T17:23:59Z) - Indoor Semantic Scene Understanding using Multi-modality Fusion [0.0]
We present a semantic scene understanding pipeline that fuses 2D and 3D detection branches to generate a semantic map of the environment.
Unlike previous works that were evaluated on collected datasets, we test our pipeline on an active photo-realistic robotic environment.
Our novelty includes rectification of 3D proposals using projected 2D detections and modality fusion based on object size.
arXiv Detail & Related papers (2021-08-17T13:30:02Z) - H2O: Two Hands Manipulating Objects for First Person Interaction
Recognition [70.46638409156772]
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects.
Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame.
Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds.
arXiv Detail & Related papers (2021-04-22T17:10:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.