Related papers: Sim-Suction: Learning a Suction Grasp Policy for Cluttered Environments Using a Synthetic Benchmark

Sim-Suction: Learning a Suction Grasp Policy for Cluttered Environments Using a Synthetic Benchmark

URL: http://arxiv.org/abs/2305.16378v2
Date: Mon, 27 Nov 2023 20:23:39 GMT
Title: Sim-Suction: Learning a Suction Grasp Policy for Cluttered Environments Using a Synthetic Benchmark
Authors: Juncheng Li, David J. Cappelleri
Abstract summary: Sim-Suction is a robust object-aware suction grasp policy for mobile manipulation platforms with dynamic camera viewpoints. Sim-Suction-Dataset comprises 500 cluttered environments with 3.2 million annotated suction grasp poses. Sim-Suction-Pointnet generates robust 6D suction grasp poses by learning point-wise affordances from the Sim-Suction-Dataset.
Score: 8.025760743074066
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents Sim-Suction, a robust object-aware suction grasp policy for mobile manipulation platforms with dynamic camera viewpoints, designed to pick up unknown objects from cluttered environments. Suction grasp policies typically employ data-driven approaches, necessitating large-scale, accurately-annotated suction grasp datasets. However, the generation of suction grasp datasets in cluttered environments remains underexplored, leaving uncertainties about the relationship between the object of interest and its surroundings. To address this, we propose a benchmark synthetic dataset, Sim-Suction-Dataset, comprising 500 cluttered environments with 3.2 million annotated suction grasp poses. The efficient Sim-Suction-Dataset generation process provides novel insights by combining analytical models with dynamic physical simulations to create fast and accurate suction grasp pose annotations. We introduce Sim-Suction-Pointnet to generate robust 6D suction grasp poses by learning point-wise affordances from the Sim-Suction-Dataset, leveraging the synergy of zero-shot text-to-segmentation. Real-world experiments for picking up all objects demonstrate that Sim-Suction-Pointnet achieves success rates of 96.76%, 94.23%, and 92.39% on cluttered level 1 objects (prismatic shape), cluttered level 2 objects (more complex geometry), and cluttered mixed objects, respectively. The Sim-Suction policies outperform state-of-the-art benchmarks tested by approximately 21% in cluttered mixed scenes.

Related papers

SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding [64.86119288520419]
multimodal language models struggle with spatial reasoning across time and space.<n>We present SIMS-V -- a systematic data-generation framework that leverages the privileged information of 3D simulators.<n>Our approach demonstrates robust generalization, maintaining performance on general video understanding while showing substantial improvements on embodied and real-world spatial tasks.
arXiv Detail & Related papers (2025-11-06T18:53:31Z)
GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered Scenes [5.289647064481469]
We present GraspClutter6D, a large-scale real-world grasping dataset featuring 1,000 cluttered scenes with dense arrangements. We benchmark state-of-the-art segmentation, object pose estimation, and grasping detection methods to provide key insights into challenges in cluttered environments. We validate the dataset's effectiveness as a training resource, demonstrating that grasping networks trained on GraspClutter6D significantly outperform those trained on existing datasets in both simulation and real-world experiments.
arXiv Detail & Related papers (2025-04-09T13:15:46Z)
Diffusion Suction Grasping with Large-Scale Parcel Dataset [6.112197264635304]
We present Parcel-Suction-Dataset, a large-scale synthetic dataset containing 25 thousand cluttered scenes with 410 million precision-annotated suction grasp poses. This dataset is generated through our novel geometric sampling algorithm that enables efficient generation of optimal suction grasps. We also propose Diffusion-Suction, an innovative framework that reformulates suction grasp prediction as a conditional generation task.
arXiv Detail & Related papers (2025-02-11T04:09:11Z)
DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation [54.02069690134526]
We propose DrivingSphere, a realistic and closed-loop simulation framework. Its core idea is to build 4D world representation and generate real-life and controllable driving scenarios. By providing a dynamic and realistic simulation environment, DrivingSphere enables comprehensive testing and validation of autonomous driving algorithms.
arXiv Detail & Related papers (2024-11-18T03:00:33Z)
DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes [18.95051035812627]
We present a large-scale synthetic benchmark, encompassing 1319 objects, 8270 scenes, and 427 million grasps. We also propose a novel two-stage grasping method that learns efficiently from data by using a diffusion model that conditions on local geometry. With the aid of test-time-depth restoration, our method demonstrates zero-shot sim-to-real transfer, attaining 90.7% real-world dexterous grasping success rate in cluttered scenes.
arXiv Detail & Related papers (2024-10-30T13:30:39Z)
Sim-Grasp: Learning 6-DOF Grasp Policies for Cluttered Environments Using a Synthetic Benchmark [6.7936188782093945]
We present Sim-Grasp, a robust 6-DOF two-finger grasping system that integrates advanced language models for enhanced object manipulation in cluttered environments. We introduce the Sim-Grasp-Dataset, which includes 1,550 objects across 500 scenarios with 7.9 million annotated labels, and develop Sim-GraspNet to generate grasp poses from point clouds.
arXiv Detail & Related papers (2024-05-01T20:08:51Z)
AGILE: Approach-based Grasp Inference Learned from Element Decomposition [2.812395851874055]
Humans can grasp objects by taking into account hand-object positioning information. This work proposes a method to enable a robot manipulator to learn the same, grasping objects in the most optimal way.
arXiv Detail & Related papers (2024-02-02T10:47:08Z)
PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments [50.79058028754952]
PACE (Pose s in Cluttered Environments) is a large-scale benchmark for pose estimation methods in cluttered scenarios. The benchmark consists of 55K frames with 258K annotations across 300 videos, covering 238 objects from 43 categories. PACE-Sim contains 100K photo-realistic simulated frames with 2.4M annotations across 931 objects.
arXiv Detail & Related papers (2023-12-23T01:38:41Z)
Benchmarking the Sim-to-Real Gap in Cloth Manipulation [10.530012817995656]
We present a benchmark dataset to evaluate the sim-to-real gap in cloth manipulation. We use the dataset to evaluate the reality gap, computational time, and stability of four popular deformable object simulators.
arXiv Detail & Related papers (2023-10-14T09:36:01Z)
Sim-MEES: Modular End-Effector System Grasping Dataset for Mobile Manipulators in Cluttered Environments [10.414347878456852]
We present a large-scale synthetic dataset that contains 1,550 objects with varying difficulty levels and physics properties, as well as 11 million grasp labels for mobile manipulators to plan grasps using different modalities in cluttered environments. Our dataset generation process combines analytic models and dynamic simulations of the entire cluttered environment to provide accurate grasp labels.
arXiv Detail & Related papers (2023-05-17T21:40:26Z)
Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels. Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions. We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z)
Point Cloud Based Reinforcement Learning for Sim-to-Real and Partial Observability in Visual Navigation [62.22058066456076]
Reinforcement Learning (RL) represents powerful tools to solve complex robotic tasks. RL does not work directly in the real-world, which is known as the sim-to-real transfer problem. We propose a method that learns on an observation space constructed by point clouds and environment randomization.
arXiv Detail & Related papers (2020-07-27T17:46:59Z)
Benchmarking Unsupervised Object Representations for Video Sequences [111.81492107649889]
We compare the perceptual abilities of four object-centric approaches: ViMON, OP3, TBA and SCALOR. Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking. Our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
arXiv Detail & Related papers (2020-06-12T09:37:24Z)
Transferable Active Grasping and Real Embodied Dataset [48.887567134129306]
We show how to search for feasible viewpoints for grasping by the use of hand-mounted RGB-D cameras. A practical 3-stage transferable active grasping pipeline is developed, that is adaptive to unseen clutter scenes. In our pipeline, we propose a novel mask-guided reward to overcome the sparse reward issue in grasping and ensure category-irrelevant behavior.
arXiv Detail & Related papers (2020-04-28T08:15:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.