GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered Scenes
- URL: http://arxiv.org/abs/2504.06866v1
- Date: Wed, 09 Apr 2025 13:15:46 GMT
- Title: GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered Scenes
- Authors: Seunghyeok Back, Joosoon Lee, Kangmin Kim, Heeseon Rho, Geonhyup Lee, Raeyoung Kang, Sangbeom Lee, Sangjun Noh, Youngjin Lee, Taeyeop Lee, Kyoobin Lee,
- Abstract summary: We present GraspClutter6D, a large-scale real-world grasping dataset featuring 1,000 cluttered scenes with dense arrangements.<n>We benchmark state-of-the-art segmentation, object pose estimation, and grasping detection methods to provide key insights into challenges in cluttered environments.<n>We validate the dataset's effectiveness as a training resource, demonstrating that grasping networks trained on GraspClutter6D significantly outperform those trained on existing datasets in both simulation and real-world experiments.
- Score: 5.289647064481469
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Robust grasping in cluttered environments remains an open challenge in robotics. While benchmark datasets have significantly advanced deep learning methods, they mainly focus on simplistic scenes with light occlusion and insufficient diversity, limiting their applicability to practical scenarios. We present GraspClutter6D, a large-scale real-world grasping dataset featuring: (1) 1,000 highly cluttered scenes with dense arrangements (14.1 objects/scene, 62.6\% occlusion), (2) comprehensive coverage across 200 objects in 75 environment configurations (bins, shelves, and tables) captured using four RGB-D cameras from multiple viewpoints, and (3) rich annotations including 736K 6D object poses and 9.3B feasible robotic grasps for 52K RGB-D images. We benchmark state-of-the-art segmentation, object pose estimation, and grasping detection methods to provide key insights into challenges in cluttered environments. Additionally, we validate the dataset's effectiveness as a training resource, demonstrating that grasping networks trained on GraspClutter6D significantly outperform those trained on existing datasets in both simulation and real-world experiments. The dataset, toolkit, and annotation tools are publicly available on our project website: https://sites.google.com/view/graspclutter6d.
Related papers
- Towards Cross-device and Training-free Robotic Grasping in 3D Open World [20.406334587479623]
This paper presents a novel pipeline capable of executing object grasping tasks in open-world scenarios without the necessity for training.
We propose to engage a training-free binary clustering algorithm that improves segmentation precision and possesses the capability to cluster and localize unseen objects for executing grasping operations.
arXiv Detail & Related papers (2024-11-27T08:23:28Z) - PickScan: Object discovery and reconstruction from handheld interactions [99.99566882133179]
We develop an interaction-guided and class-agnostic method to reconstruct 3D representations of scenes.
Our main contribution is a novel approach to detecting user-object interactions and extracting the masks of manipulated objects.
Compared to Co-Fusion, the only comparable interaction-based and class-agnostic baseline, this corresponds to a reduction in chamfer distance of 73%.
arXiv Detail & Related papers (2024-11-17T23:09:08Z) - PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments [50.79058028754952]
PACE (Pose s in Cluttered Environments) is a large-scale benchmark for pose estimation methods in cluttered scenarios.
The benchmark consists of 55K frames with 258K annotations across 300 videos, covering 238 objects from 43 categories.
PACE-Sim contains 100K photo-realistic simulated frames with 2.4M annotations across 931 objects.
arXiv Detail & Related papers (2023-12-23T01:38:41Z) - HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose
Annotations, Affordances, and Reconstructions [17.9178233068395]
We present the HANDAL dataset for category-level object pose estimation and affordance prediction.
The dataset consists of 308k annotated image frames from 2.2k videos of 212 real-world objects in 17 categories.
We outline the usefulness of our dataset for 6-DoF category-level pose+scale estimation and related tasks.
arXiv Detail & Related papers (2023-08-02T23:59:59Z) - Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene
Scale and Realism Tradeoffs for ObjectGoal Navigation [70.82403156865057]
We investigate the impact of synthetic 3D scene dataset scale and realism on the task of training embodied agents to find and navigate to objects.
Our experiments show that agents trained on our smaller-scale dataset can match or outperform agents trained on much larger datasets.
arXiv Detail & Related papers (2023-06-20T05:07:23Z) - REGRAD: A Large-Scale Relational Grasp Dataset for Safe and
Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps.
Our dataset is collected in both forms of 2D images and 3D point clouds.
Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z) - Weakly Supervised 3D Object Detection from Lidar Point Cloud [182.67704224113862]
It is laborious to manually label point cloud data for training high-quality 3D object detectors.
This work proposes a weakly supervised approach for 3D object detection, only requiring a small set of weakly annotated scenes.
Using only 500 weakly annotated scenes and 534 precisely labeled vehicle instances, our method achieves 85-95% the performance of current top-leading, fully supervised detectors.
arXiv Detail & Related papers (2020-07-23T10:12:46Z) - Transferable Active Grasping and Real Embodied Dataset [48.887567134129306]
We show how to search for feasible viewpoints for grasping by the use of hand-mounted RGB-D cameras.
A practical 3-stage transferable active grasping pipeline is developed, that is adaptive to unseen clutter scenes.
In our pipeline, we propose a novel mask-guided reward to overcome the sparse reward issue in grasping and ensure category-irrelevant behavior.
arXiv Detail & Related papers (2020-04-28T08:15:35Z) - How to track your dragon: A Multi-Attentional Framework for real-time
RGB-D 6-DOF Object Pose Tracking [35.21561169636035]
We present a novel multi-attentional convolutional architecture to tackle the problem of real-time RGB-D 6D object pose tracking.
We consider the special geometrical properties of both the object's 3D model and the pose space, and we use a more sophisticated approach for data augmentation during training.
arXiv Detail & Related papers (2020-04-21T23:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.