MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis
- URL: http://arxiv.org/abs/2112.14663v2
- Date: Thu, 30 Dec 2021 18:05:26 GMT
- Title: MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis
- Authors: Yuhao Chen, E. Zhixuan Zeng, Maximilian Gilles, Alexander Wong
- Abstract summary: We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis.
The proposed dataset contains 100,000 images and 25 different object types.
We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
- Score: 78.26022688167133
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There has been increasing interest in smart factories powered by robotics
systems to tackle repetitive, laborious tasks. One impactful yet challenging
task in robotics-powered smart factory applications is robotic grasping: using
robotic arms to grasp objects autonomously in different settings. Robotic
grasping requires a variety of computer vision tasks such as object detection,
segmentation, grasp prediction, pick planning, etc. While significant progress
has been made in leveraging of machine learning for robotic grasping,
particularly with deep learning, a big challenge remains in the need for
large-scale, high-quality RGBD datasets that cover a wide diversity of
scenarios and permutations. To tackle this big, diverse data problem, we are
inspired by the recent rise in the concept of metaverse, which has greatly
closed the gap between virtual worlds and the physical world. Metaverses allow
us to create digital twins of real-world manufacturing scenarios and to
virtually create different scenarios from which large volumes of data can be
generated for training models. In this paper, we present MetaGraspNet: a
large-scale benchmark dataset for vision-driven robotic grasping via
physics-based metaverse synthesis. The proposed dataset contains 100,000 images
and 25 different object types and is split into 5 difficulties to evaluate
object detection and segmentation model performance in different grasping
scenarios. We also propose a new layout-weighted performance metric alongside
the dataset for evaluating object detection and segmentation performance in a
manner that is more appropriate for robotic grasp applications compared to
existing general-purpose performance metrics. Our benchmark dataset is
available open-source on Kaggle, with the first phase consisting of detailed
object detection, segmentation, layout annotations, and a layout-weighted
performance metric script.
Related papers
- M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes [66.44171200767839]
We propose M3Bench, a new benchmark of whole-body motion generation for mobile manipulation tasks.
M3Bench requires an embodied agent to understand its configuration, environmental constraints and task objectives.
M3Bench features 30k object rearrangement tasks across 119 diverse scenes, providing expert demonstrations generated by our newly developed M3BenchMaker.
arXiv Detail & Related papers (2024-10-09T08:38:21Z) - Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model [35.184607650708784]
Articulate-Anything automates the articulation of diverse, complex objects from many input modalities, including text, images, and videos.
Our system exploits existing 3D asset datasets via a mesh retrieval mechanism, along with an actor-critic system that iteratively proposes, evaluates, and refines solutions.
arXiv Detail & Related papers (2024-10-03T19:42:16Z) - Tiny Robotics Dataset and Benchmark for Continual Object Detection [6.4036245876073234]
This work introduces a novel benchmark to evaluate the continual learning capabilities of object detection systems in tiny robotic platforms.
Our contributions include: (i) Tiny Robotics Object Detection (TiROD), a comprehensive dataset collected using a small mobile robot, designed to test the adaptability of object detectors across various domains and classes; (ii) an evaluation of state-of-the-art real-time object detectors combined with different continual learning strategies on this dataset; and (iii) we publish the data and the code to replicate the results to foster continuous advancements in this field.
arXiv Detail & Related papers (2024-09-24T16:21:27Z) - BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation [57.40024206484446]
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models.
BVS supports a large number of adjustable parameters at the scene level.
We showcase three example application scenarios.
arXiv Detail & Related papers (2024-05-15T17:57:56Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - HabitatDyn Dataset: Dynamic Object Detection to Kinematics Estimation [16.36110033895749]
We propose the dataset HabitatDyn, which contains both synthetic RGB videos, semantic labels, and depth information, as well as kinetics information.
HabitatDyn was created from the perspective of a mobile robot with a moving camera, and contains 30 scenes featuring six different types of moving objects with varying velocities.
arXiv Detail & Related papers (2023-04-21T09:57:35Z) - RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties.
We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.