MonoGraspNet: 6-DoF Grasping with a Single RGB Image
- URL: http://arxiv.org/abs/2209.13036v1
- Date: Mon, 26 Sep 2022 21:29:50 GMT
- Title: MonoGraspNet: 6-DoF Grasping with a Single RGB Image
- Authors: Guangyao Zhai, Dianye Huang, Shun-Cheng Wu, Hyunjun Jung, Yan Di,
Fabian Manhardt, Federico Tombari, Nassir Navab and Benjamin Busam
- Abstract summary: 6-DoF robotic grasping is a long-lasting but unsolved problem.
Recent methods utilize strong 3D networks to extract geometric grasping representations from depth sensors.
We propose the first RGB-only 6-DoF grasping pipeline called MonoGraspNet.
- Score: 73.96707595661867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 6-DoF robotic grasping is a long-lasting but unsolved problem. Recent methods
utilize strong 3D networks to extract geometric grasping representations from
depth sensors, demonstrating superior accuracy on common objects but perform
unsatisfactorily on photometrically challenging objects, e.g., objects in
transparent or reflective materials. The bottleneck lies in that the surface of
these objects can not reflect back accurate depth due to the absorption or
refraction of light. In this paper, in contrast to exploiting the inaccurate
depth data, we propose the first RGB-only 6-DoF grasping pipeline called
MonoGraspNet that utilizes stable 2D features to simultaneously handle
arbitrary object grasping and overcome the problems induced by photometrically
challenging objects. MonoGraspNet leverages keypoint heatmap and normal map to
recover the 6-DoF grasping poses represented by our novel representation
parameterized with 2D keypoints with corresponding depth, grasping direction,
grasping width, and angle. Extensive experiments in real scenes demonstrate
that our method can achieve competitive results in grasping common objects and
surpass the depth-based competitor by a large margin in grasping
photometrically challenging objects. To further stimulate robotic manipulation
research, we additionally annotate and open-source a multi-view and multi-scene
real-world grasping dataset, containing 120 objects of mixed photometric
complexity with 20M accurate grasping labels.
Related papers
- Diffusion-Based Depth Inpainting for Transparent and Reflective Objects [6.571006663689738]
We propose a diffusion-based Depth Inpainting framework specifically designed for Transparent and Reflective objects.
DITR is highly effective in depth inpainting tasks of transparent and reflective objects with robust adaptability.
arXiv Detail & Related papers (2024-10-11T06:45:15Z) - OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection [102.0744303467713]
We propose a new multi-view 3D object detector named OPEN.
Our main idea is to effectively inject object-wise depth information into the network through our proposed object-wise position embedding.
OPEN achieves a new state-of-the-art performance with 64.4% NDS and 56.7% mAP on the nuScenes test benchmark.
arXiv Detail & Related papers (2024-07-15T14:29:15Z) - ASGrasp: Generalizable Transparent Object Reconstruction and Grasping from RGB-D Active Stereo Camera [9.212504138203222]
We propose ASGrasp, a 6-DoF grasp detection network that uses an RGB-D active stereo camera.
Our system distinguishes itself by its ability to directly utilize raw IR and RGB images for transparent object geometry reconstruction.
Our experiments demonstrate that ASGrasp can achieve over 90% success rate for generalizable transparent object grasping.
arXiv Detail & Related papers (2024-05-09T09:44:51Z) - MoGDE: Boosting Mobile Monocular 3D Object Detection with Ground Depth
Estimation [20.697822444708237]
We propose a novel Mono3D framework, called MoGDE, which constantly estimates the corresponding ground depth of an image.
MoGDE yields the best performance compared with the state-of-the-art methods by a large margin and is ranked number one on the KITTI 3D benchmark.
arXiv Detail & Related papers (2023-03-23T04:06:01Z) - Grasping the Inconspicuous [15.274311118568715]
We study deep learning 6D pose estimation from RGB images only for transparent object grasping.
Experiments demonstrate the effectiveness of RGB image space for grasping transparent objects.
arXiv Detail & Related papers (2022-11-15T14:45:50Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth
Completion and Grasping [46.6058840385155]
We contribute a large-scale real-world dataset for transparent object depth completion.
Our dataset contains 57,715 RGB-D images from 130 different scenes.
We propose an end-to-end depth completion network, which takes the RGB image and the inaccurate depth map as inputs and outputs a refined depth map.
arXiv Detail & Related papers (2022-02-17T06:50:20Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.