RGB Matters: Learning 7-DoF Grasp Poses on Monocular RGBD Images
- URL: http://arxiv.org/abs/2103.02184v1
- Date: Wed, 3 Mar 2021 05:12:20 GMT
- Title: RGB Matters: Learning 7-DoF Grasp Poses on Monocular RGBD Images
- Authors: Minghao Gou, Hao-Shu Fang, Zhanda Zhu, Sheng Xu, Chenxi Wang, Cewu Lu
- Abstract summary: General object grasping is an important yet unsolved problem in the field of robotics.
We propose RGBD-Grasp, a pipeline that solves this problem by decoupling 7-DoF grasp detection into two sub-tasks.
We achieve state-of-the-art results on GraspNet-1Billion dataset.
- Score: 42.68340286459079
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: General object grasping is an important yet unsolved problem in the field of
robotics. Most of the current methods either generate grasp poses with few DoF
that fail to cover most of the success grasps, or only take the unstable depth
image or point cloud as input which may lead to poor results in some cases. In
this paper, we propose RGBD-Grasp, a pipeline that solves this problem by
decoupling 7-DoF grasp detection into two sub-tasks where RGB and depth
information are processed separately. In the first stage, an encoder-decoder
like convolutional neural network Angle-View Net(AVN) is proposed to predict
the SO(3) orientation of the gripper at every location of the image.
Consequently, a Fast Analytic Searching(FAS) module calculates the opening
width and the distance of the gripper to the grasp point. By decoupling the
grasp detection problem and introducing the stable RGB modality, our pipeline
alleviates the requirement for the high-quality depth image and is robust to
depth sensor noise. We achieve state-of-the-art results on GraspNet-1Billion
dataset compared with several baselines. Real robot experiments on a UR5 robot
with an Intel Realsense camera and a Robotiq two-finger gripper show high
success rates for both single object scenes and cluttered scenes. Our code and
trained model will be made publicly available.
Related papers
- ASGrasp: Generalizable Transparent Object Reconstruction and Grasping from RGB-D Active Stereo Camera [9.212504138203222]
We propose ASGrasp, a 6-DoF grasp detection network that uses an RGB-D active stereo camera.
Our system distinguishes itself by its ability to directly utilize raw IR and RGB images for transparent object geometry reconstruction.
Our experiments demonstrate that ASGrasp can achieve over 90% success rate for generalizable transparent object grasping.
arXiv Detail & Related papers (2024-05-09T09:44:51Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - Pyramid Deep Fusion Network for Two-Hand Reconstruction from RGB-D Images [11.100398985633754]
We propose an end-to-end framework for recovering dense meshes for both hands.
Our framework employs ResNet50 and PointNet++ to derive features from RGB and point cloud.
We also introduce a novel pyramid deep fusion network (PDFNet) to aggregate features at different scales.
arXiv Detail & Related papers (2023-07-12T09:33:21Z) - ViDaS Video Depth-aware Saliency Network [40.08270905030302]
We introduce ViDaS, a two-stream, fully convolutional Video, Depth-Aware Saliency network.
It addresses the problem of attention modeling in-the-wild" via saliency prediction in videos.
Network consists of two visual streams, one for the RGB frames, and one for the depth frames.
It is trained end-to-end and is evaluated in a variety of different databases with eye-tracking data.
arXiv Detail & Related papers (2023-05-19T15:04:49Z) - One-Shot Neural Fields for 3D Object Understanding [112.32255680399399]
We present a unified and compact scene representation for robotics.
Each object in the scene is depicted by a latent code capturing geometry and appearance.
This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction, and stable grasp prediction.
arXiv Detail & Related papers (2022-10-21T17:33:14Z) - MonoGraspNet: 6-DoF Grasping with a Single RGB Image [73.96707595661867]
6-DoF robotic grasping is a long-lasting but unsolved problem.
Recent methods utilize strong 3D networks to extract geometric grasping representations from depth sensors.
We propose the first RGB-only 6-DoF grasping pipeline called MonoGraspNet.
arXiv Detail & Related papers (2022-09-26T21:29:50Z) - SPSN: Superpixel Prototype Sampling Network for RGB-D Salient Object
Detection [5.2134203335146925]
RGB-D salient object detection (SOD) has been in the spotlight recently because it is an important preprocessing operation for various vision tasks.
Despite advances in deep learning-based methods, RGB-D SOD is still challenging due to the large domain gap between an RGB image and the depth map and low-quality depth maps.
We propose a novel superpixel prototype sampling network architecture to solve this problem.
arXiv Detail & Related papers (2022-07-16T10:43:14Z) - RGB-D Local Implicit Function for Depth Completion of Transparent
Objects [43.238923881620494]
Majority of perception methods in robotics require depth information provided by RGB-D cameras.
Standard 3D sensors fail to capture depth of transparent objects due to refraction and absorption of light.
We present a novel framework that can complete missing depth given noisy RGB-D input.
arXiv Detail & Related papers (2021-04-01T17:00:04Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - Is Depth Really Necessary for Salient Object Detection? [50.10888549190576]
We make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference.
Not only surpasses the state-of-the-art performances on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin.
arXiv Detail & Related papers (2020-05-30T13:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.