Learning Selective Mutual Attention and Contrast for RGB-D Saliency
Detection
- URL: http://arxiv.org/abs/2010.05537v1
- Date: Mon, 12 Oct 2020 08:50:10 GMT
- Title: Learning Selective Mutual Attention and Contrast for RGB-D Saliency
Detection
- Authors: Nian Liu, Ni Zhang, Ling Shao, Junwei Han
- Abstract summary: How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection.
Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods.
We propose a novel mutual attention model by fusing attention and contexts from different modalities.
- Score: 145.4919781325014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How to effectively fuse cross-modal information is the key problem for RGB-D
salient object detection. Early fusion and the result fusion schemes fuse RGB
and depth information at the input and output stages, respectively, hence incur
the problem of distribution gap or information loss. Many models use the
feature fusion strategy but are limited by the low-order point-to-point fusion
methods. In this paper, we propose a novel mutual attention model by fusing
attention and contexts from different modalities. We use the non-local
attention of one modality to propagate long-range contextual dependencies for
the other modality, thus leveraging complementary attention cues to perform
high-order and trilinear cross-modal interaction. We also propose to induce
contrast inference from the mutual attention and obtain a unified model.
Considering low-quality depth data may detriment the model performance, we
further propose selective attention to reweight the added depth cues. We embed
the proposed modules in a two-stream CNN for RGB-D SOD. Experimental results
have demonstrated the effectiveness of our proposed model. Moreover, we also
construct a new challenging large-scale RGB-D SOD dataset with high-quality,
thus can both promote the training and evaluation of deep models.
Related papers
- Point-aware Interaction and CNN-induced Refinement Network for RGB-D
Salient Object Detection [95.84616822805664]
We introduce CNNs-assisted Transformer architecture and propose a novel RGB-D SOD network with Point-aware Interaction and CNN-induced Refinement.
In order to alleviate the block effect and detail destruction problems brought by the Transformer naturally, we design a CNN-induced refinement (CNNR) unit for content refinement and supplementation.
arXiv Detail & Related papers (2023-08-17T11:57:49Z) - Attentive Multimodal Fusion for Optical and Scene Flow [24.08052492109655]
Existing methods typically rely solely on RGB images or fuse the modalities at later stages.
We propose a novel deep neural network approach named FusionRAFT, which enables early-stage information fusion between sensor modalities.
Our approach exhibits improved robustness in the presence of noise and low-lighting conditions that affect the RGB images.
arXiv Detail & Related papers (2023-07-28T04:36:07Z) - HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness [2.341385717236931]
We propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection.
Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies.
Our HiDAnet performs favorably over the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2023-01-18T10:00:59Z) - Interactive Context-Aware Network for RGB-T Salient Object Detection [7.544240329265388]
We propose a novel network called Interactive Context-Aware Network (ICANet)
ICANet contains three modules that can effectively perform the cross-modal and cross-scale fusions.
Experiments prove that our network performs favorably against the state-of-the-art RGB-T SOD methods.
arXiv Detail & Related papers (2022-11-11T10:04:36Z) - Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer [53.413305467674434]
We introduce open-source RGB data to support spike depth estimation, leveraging its annotations and spatial information.
We propose a cross-modality cross-domain (BiCross) framework to realize unsupervised spike depth estimation.
Our method achieves state-of-the-art (SOTA) performances, compared with RGB-oriented unsupervised depth estimation methods.
arXiv Detail & Related papers (2022-08-26T09:35:20Z) - Robust RGB-D Fusion for Saliency Detection [13.705088021517568]
We propose a robust RGB-D fusion method that benefits from layer-wise and trident spatial, attention mechanisms.
Our experiments on five benchmark datasets demonstrate that the proposed fusion method performs consistently better than the state-of-the-art fusion alternatives.
arXiv Detail & Related papers (2022-08-02T21:23:00Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - RGB-D Saliency Detection via Cascaded Mutual Information Minimization [122.8879596830581]
Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning.
We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
arXiv Detail & Related papers (2021-09-15T12:31:27Z) - DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D
Salient Object Detection [107.96418568008644]
We propose a novel network named DPANet to explicitly model the potentiality of the depth map and effectively integrate the cross-modal complementarity.
By introducing the depth potentiality perception, the network can perceive the potentiality of depth information in a learning-based manner.
arXiv Detail & Related papers (2020-03-19T07:27:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.