Robust RGB-D Fusion for Saliency Detection
- URL: http://arxiv.org/abs/2208.01762v1
- Date: Tue, 2 Aug 2022 21:23:00 GMT
- Title: Robust RGB-D Fusion for Saliency Detection
- Authors: Zongwei Wu, Shriarulmozhivarman Gobichettipalayam, Brahim Tamadazte,
Guillaume Allibert, Danda Pani Paudel, C\'edric Demonceaux
- Abstract summary: We propose a robust RGB-D fusion method that benefits from layer-wise and trident spatial, attention mechanisms.
Our experiments on five benchmark datasets demonstrate that the proposed fusion method performs consistently better than the state-of-the-art fusion alternatives.
- Score: 13.705088021517568
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficiently exploiting multi-modal inputs for accurate RGB-D saliency
detection is a topic of high interest. Most existing works leverage cross-modal
interactions to fuse the two streams of RGB-D for intermediate features'
enhancement. In this process, a practical aspect of the low quality of the
available depths has not been fully considered yet. In this work, we aim for
RGB-D saliency detection that is robust to the low-quality depths which
primarily appear in two forms: inaccuracy due to noise and the misalignment to
RGB. To this end, we propose a robust RGB-D fusion method that benefits from
(1) layer-wise, and (2) trident spatial, attention mechanisms. On the one hand,
layer-wise attention (LWA) learns the trade-off between early and late fusion
of RGB and depth features, depending upon the depth accuracy. On the other
hand, trident spatial attention (TSA) aggregates the features from a wider
spatial context to address the depth misalignment problem. The proposed LWA and
TSA mechanisms allow us to efficiently exploit the multi-modal inputs for
saliency detection while being robust against low-quality depths. Our
experiments on five benchmark datasets demonstrate that the proposed fusion
method performs consistently better than the state-of-the-art fusion
alternatives.
Related papers
- HODINet: High-Order Discrepant Interaction Network for RGB-D Salient
Object Detection [4.007827908611563]
RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information.
Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features.
In this paper, we propose a high-order discrepant interaction network (HODINet) for RGB-D SOD.
arXiv Detail & Related papers (2023-07-03T11:56:21Z) - RGB-D Grasp Detection via Depth Guided Learning with Cross-modal
Attention [14.790193023912973]
This paper proposes a novel learning based approach to RGB-D grasp detection, namely Depth Guided Cross-modal Attention Network (DGCAN)
To better leverage the geometry information recorded in the depth channel, a complete 6-dimensional rectangle representation is adopted with the grasp depth dedicatedly considered.
The prediction of the extra grasp depth substantially strengthens feature learning, thereby leading to more accurate results.
arXiv Detail & Related papers (2023-02-28T02:41:27Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - RGB-D Salient Object Detection with Ubiquitous Target Awareness [37.6726410843724]
We make the first attempt to solve the RGB-D salient object detection problem with a novel depth-awareness framework.
We propose a Ubiquitous Target Awareness (UTA) network to solve three important challenges in RGB-D SOD task.
Our proposed UTA network is depth-free for inference and runs in real-time with 43 FPS.
arXiv Detail & Related papers (2021-09-08T04:27:29Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Deep RGB-D Saliency Detection with Depth-Sensitive Attention and
Automatic Multi-Modal Fusion [15.033234579900657]
RGB-D salient object detection (SOD) is usually formulated as a problem of classification or regression over two modalities, i.e., RGB and depth.
We propose a depth-sensitive RGB feature modeling scheme using the depth-wise geometric prior of salient objects.
Experiments on seven standard benchmarks demonstrate the effectiveness of the proposed approach against the state-of-the-art.
arXiv Detail & Related papers (2021-03-22T13:28:45Z) - Learning Selective Mutual Attention and Contrast for RGB-D Saliency
Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection.
Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods.
We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z) - Depth Quality Aware Salient Object Detection [52.618404186447165]
This paper attempts to integrate a novel depth quality aware into the classic bi-stream structure, aiming to assess the depth quality before conducting the selective RGB-D fusion.
Compared with the SOTA bi-stream methods, the major highlight of our method is its ability to lessen the importance of those low-quality, no-contribution, or even negative-contribution D regions during the RGB-D fusion.
arXiv Detail & Related papers (2020-08-07T09:54:39Z) - Is Depth Really Necessary for Salient Object Detection? [50.10888549190576]
We make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference.
Not only surpasses the state-of-the-art performances on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin.
arXiv Detail & Related papers (2020-05-30T13:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.