Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection
- URL: http://arxiv.org/abs/2108.01971v1
- Date: Wed, 4 Aug 2021 11:24:42 GMT
- Title: Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection
- Authors: Chen Zhang, Runmin Cong, Qinwei Lin, Lin Ma, Feng Li, Yao Zhao, Sam
Kwong
- Abstract summary: We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
- Score: 78.47767202232298
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The popularity and promotion of depth maps have brought new vigor and
vitality into salient object detection (SOD), and a mass of RGB-D SOD
algorithms have been proposed, mainly concentrating on how to better integrate
cross-modality features from RGB image and depth map. For the cross-modality
interaction in feature encoder, existing methods either indiscriminately treat
RGB and depth modalities, or only habitually utilize depth cues as auxiliary
information of the RGB branch. Different from them, we reconsider the status of
two modalities and propose a novel Cross-modality Discrepant Interaction
Network (CDINet) for RGB-D SOD, which differentially models the dependence of
two modalities according to the feature representations of different layers. To
this end, two components are designed to implement the effective cross-modality
interaction: 1) the RGB-induced Detail Enhancement (RDE) module leverages RGB
modality to enhance the details of the depth features in low-level encoder
stage. 2) the Depth-induced Semantic Enhancement (DSE) module transfers the
object positioning and internal consistency of depth features to the RGB branch
in high-level encoder stage. Furthermore, we also design a Dense Decoding
Reconstruction (DDR) structure, which constructs a semantic block by combining
multi-level encoder features to upgrade the skip connection in the feature
decoding. Extensive experiments on five benchmark datasets demonstrate that our
network outperforms $15$ state-of-the-art methods both quantitatively and
qualitatively. Our code is publicly available at:
https://rmcong.github.io/proj_CDINet.html.
Related papers
- The Devil is in the Details: Boosting Guided Depth Super-Resolution via
Rethinking Cross-Modal Alignment and Aggregation [41.12790340577986]
Guided depth super-resolution (GDSR) involves restoring missing depth details using the high-resolution RGB image of the same scene.
Previous approaches have struggled with the heterogeneity and complementarity of the multi-modal inputs, and neglected the issues of modal misalignment, geometrical misalignment, and feature selection.
arXiv Detail & Related papers (2024-01-16T05:37:08Z) - HODINet: High-Order Discrepant Interaction Network for RGB-D Salient
Object Detection [4.007827908611563]
RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information.
Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features.
In this paper, we propose a high-order discrepant interaction network (HODINet) for RGB-D SOD.
arXiv Detail & Related papers (2023-07-03T11:56:21Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient
Object Detection [73.31632581915201]
We propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction.
A newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D.
arXiv Detail & Related papers (2020-08-07T10:13:05Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Cross-Modal Weighting Network for RGB-D Salient Object Detection [76.0965123893641]
We propose a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD.
Specifically, three RGB-depth interaction modules, named CMW-L, CMW-M and CMW-H, are developed to deal with respectively low-, middle- and high-level cross-modal information fusion.
CMWNet consistently outperforms 15 state-of-the-art RGB-D SOD methods on seven popular benchmarks.
arXiv Detail & Related papers (2020-07-09T16:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.