PEGG-Net: Pixel-Wise Efficient Grasp Generation in Complex Scenes
- URL: http://arxiv.org/abs/2203.16301v3
- Date: Thu, 13 Jul 2023 09:52:04 GMT
- Title: PEGG-Net: Pixel-Wise Efficient Grasp Generation in Complex Scenes
- Authors: Haozhe Wang, Zhiyang Liu, Lei Zhou, Huan Yin, and Marcelo H Ang Jr
- Abstract summary: In this work, we study the existing planar grasp estimation algorithms and analyze the related challenges in complex scenes.
We design a Pixel-wise Efficient Grasp Generation Network (PEGG-Net) to tackle the problem of grasping in complex scenes.
PEGG-Net can achieve improved state-of-the-art performance on the Cornell dataset (98.9%) and second-best performance on the Jacquard dataset (93.8%)
- Score: 7.907697609965681
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-based grasp estimation is an essential part of robotic manipulation
tasks in the real world. Existing planar grasp estimation algorithms have been
demonstrated to work well in relatively simple scenes. But when it comes to
complex scenes, such as cluttered scenes with messy backgrounds and moving
objects, the algorithms from previous works are prone to generate inaccurate
and unstable grasping contact points. In this work, we first study the existing
planar grasp estimation algorithms and analyze the related challenges in
complex scenes. Secondly, we design a Pixel-wise Efficient Grasp Generation
Network (PEGG-Net) to tackle the problem of grasping in complex scenes.
PEGG-Net can achieve improved state-of-the-art performance on the Cornell
dataset (98.9%) and second-best performance on the Jacquard dataset (93.8%),
outperforming other existing algorithms without the introduction of complex
structures. Thirdly, PEGG-Net could operate in a closed-loop manner for added
robustness in dynamic environments using position-based visual servoing (PBVS).
Finally, we conduct real-world experiments on static, dynamic, and cluttered
objects in different complex scenes. The results show that our proposed network
achieves a high success rate in grasping irregular objects, household objects,
and workshop tools. To benefit the community, our trained model and
supplementary materials are available at https://github.com/HZWang96/PEGG-Net.
Related papers
- MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.
By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.
We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - ShapeGrasp: Zero-Shot Task-Oriented Grasping with Large Language Models through Geometric Decomposition [8.654140442734354]
Task-oriented grasping of unfamiliar objects is a necessary skill for robots in dynamic in-home environments.
We present a novel zero-shot task-oriented grasping method leveraging a geometric decomposition of the target object into simple convex shapes.
Our approach employs minimal essential information - the object's name and the intended task - to facilitate zero-shot task-oriented grasping.
arXiv Detail & Related papers (2024-03-26T19:26:53Z) - ICGNet: A Unified Approach for Instance-Centric Grasping [42.92991092305974]
We introduce an end-to-end architecture for object-centric grasping.
We show the effectiveness of the proposed method by extensively evaluating it against state-of-the-art methods on synthetic datasets.
arXiv Detail & Related papers (2024-01-18T12:41:41Z) - GraNet: A Multi-Level Graph Network for 6-DoF Grasp Pose Generation in
Cluttered Scenes [0.5755004576310334]
GraNet is a graph-based grasp pose generation framework that translates a point cloud scene into multi-level graphs.
Our pipeline can thus characterize the spatial distribution of grasps in cluttered scenes, leading to a higher rate of effective grasping.
Our method achieves state-of-the-art performance on the large-scale GraspNet-1Billion benchmark, especially in grasping unseen objects.
arXiv Detail & Related papers (2023-12-06T08:36:29Z) - Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in
Clutter [14.489086924126253]
This work focuses on the task of referring grasp synthesis, which predicts a grasp pose for an object referred through natural language in cluttered scenes.
Existing approaches often employ multi-stage pipelines that first segment the referred object and then propose a suitable grasp, and are evaluated in private datasets or simulators that do not capture the complexity of natural indoor scenes.
We propose a novel end-to-end model (CROG) that leverages the visual grounding capabilities of CLIP to learn synthesis grasp directly from image-text pairs.
arXiv Detail & Related papers (2023-11-09T22:55:10Z) - Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches.
We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment.
Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - Iterative Corresponding Geometry: Fusing Region and Depth for Highly
Efficient 3D Tracking of Textureless Objects [25.448657318818764]
ICG is a novel probabilistic tracker that fuses region and depth information and only requires the object geometry.
Our method deploys correspondence lines and points to iteratively refine the pose.
Experiments on the YCB-Video, OPT, and Choi datasets demonstrate that, even for textured objects, our approach outperforms the current state of the art.
arXiv Detail & Related papers (2022-03-10T12:30:50Z) - RICE: Refining Instance Masks in Cluttered Environments with Graph
Neural Networks [53.15260967235835]
We propose a novel framework that refines the output of such methods by utilizing a graph-based representation of instance masks.
We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the segmentations.
We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes.
arXiv Detail & Related papers (2021-06-29T20:29:29Z) - Analysis of voxel-based 3D object detection methods efficiency for
real-time embedded systems [93.73198973454944]
Two popular voxel-based 3D object detection methods are studied in this paper.
Our experiments show that these methods mostly fail to detect distant small objects due to the sparsity of the input point clouds at large distances.
Our findings suggest that a considerable part of the computations of existing methods is focused on locations of the scene that do not contribute with successful detection.
arXiv Detail & Related papers (2021-05-21T12:40:59Z) - REGRAD: A Large-Scale Relational Grasp Dataset for Safe and
Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps.
Our dataset is collected in both forms of 2D images and 3D point clouds.
Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.