RGB-D Grasp Detection via Depth Guided Learning with Cross-modal
Attention
- URL: http://arxiv.org/abs/2302.14264v1
- Date: Tue, 28 Feb 2023 02:41:27 GMT
- Title: RGB-D Grasp Detection via Depth Guided Learning with Cross-modal
Attention
- Authors: Ran Qin, Haoxiang Ma, Boyang Gao, Di Huang
- Abstract summary: This paper proposes a novel learning based approach to RGB-D grasp detection, namely Depth Guided Cross-modal Attention Network (DGCAN)
To better leverage the geometry information recorded in the depth channel, a complete 6-dimensional rectangle representation is adopted with the grasp depth dedicatedly considered.
The prediction of the extra grasp depth substantially strengthens feature learning, thereby leading to more accurate results.
- Score: 14.790193023912973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Planar grasp detection is one of the most fundamental tasks to robotic
manipulation, and the recent progress of consumer-grade RGB-D sensors enables
delivering more comprehensive features from both the texture and shape
modalities. However, depth maps are generally of a relatively lower quality
with much stronger noise compared to RGB images, making it challenging to
acquire grasp depth and fuse multi-modal clues. To address the two issues, this
paper proposes a novel learning based approach to RGB-D grasp detection, namely
Depth Guided Cross-modal Attention Network (DGCAN). To better leverage the
geometry information recorded in the depth channel, a complete 6-dimensional
rectangle representation is adopted with the grasp depth dedicatedly considered
in addition to those defined in the common 5-dimensional one. The prediction of
the extra grasp depth substantially strengthens feature learning, thereby
leading to more accurate results. Moreover, to reduce the negative impact
caused by the discrepancy of data quality in two modalities, a Local
Cross-modal Attention (LCA) module is designed, where the depth features are
refined according to cross-modal relations and concatenated to the RGB ones for
more sufficient fusion. Extensive simulation and physical evaluations are
conducted and the experimental results highlight the superiority of the
proposed approach.
Related papers
- Symmetric Uncertainty-Aware Feature Transmission for Depth
Super-Resolution [52.582632746409665]
We propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR.
Our method achieves superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-01T06:35:59Z) - Robust RGB-D Fusion for Saliency Detection [13.705088021517568]
We propose a robust RGB-D fusion method that benefits from layer-wise and trident spatial, attention mechanisms.
Our experiments on five benchmark datasets demonstrate that the proposed fusion method performs consistently better than the state-of-the-art fusion alternatives.
arXiv Detail & Related papers (2022-08-02T21:23:00Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - RGB-D Salient Object Detection with Ubiquitous Target Awareness [37.6726410843724]
We make the first attempt to solve the RGB-D salient object detection problem with a novel depth-awareness framework.
We propose a Ubiquitous Target Awareness (UTA) network to solve three important challenges in RGB-D SOD task.
Our proposed UTA network is depth-free for inference and runs in real-time with 43 FPS.
arXiv Detail & Related papers (2021-09-08T04:27:29Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Deep RGB-D Saliency Detection with Depth-Sensitive Attention and
Automatic Multi-Modal Fusion [15.033234579900657]
RGB-D salient object detection (SOD) is usually formulated as a problem of classification or regression over two modalities, i.e., RGB and depth.
We propose a depth-sensitive RGB feature modeling scheme using the depth-wise geometric prior of salient objects.
Experiments on seven standard benchmarks demonstrate the effectiveness of the proposed approach against the state-of-the-art.
arXiv Detail & Related papers (2021-03-22T13:28:45Z) - Accurate RGB-D Salient Object Detection via Collaborative Learning [101.82654054191443]
RGB-D saliency detection shows impressive ability on some challenge scenarios.
We propose a novel collaborative learning framework where edge, depth and saliency are leveraged in a more efficient way.
arXiv Detail & Related papers (2020-07-23T04:33:36Z) - DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D
Salient Object Detection [107.96418568008644]
We propose a novel network named DPANet to explicitly model the potentiality of the depth map and effectively integrate the cross-modal complementarity.
By introducing the depth potentiality perception, the network can perceive the potentiality of depth information in a learning-based manner.
arXiv Detail & Related papers (2020-03-19T07:27:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.