Siamese Network for RGB-D Salient Object Detection and Beyond
- URL: http://arxiv.org/abs/2008.12134v2
- Date: Fri, 16 Apr 2021 05:52:03 GMT
- Title: Siamese Network for RGB-D Salient Object Detection and Beyond
- Authors: Keren Fu, Deng-Ping Fan, Ge-Peng Ji, Qijun Zhao, Jianbing Shen, Ce Zhu
- Abstract summary: A novel framework is proposed to learn from both RGB and depth inputs through a shared network backbone.
Comprehensive experiments using five popular metrics show that the designed framework yields a robust RGB-D saliency detector.
We also link JL-DCF to the RGB-D semantic segmentation field, showing its capability of outperforming several semantic segmentation models.
- Score: 113.30063105890041
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing RGB-D salient object detection (SOD) models usually treat RGB and
depth as independent information and design separate networks for feature
extraction from each. Such schemes can easily be constrained by a limited
amount of training data or over-reliance on an elaborately designed training
process. Inspired by the observation that RGB and depth modalities actually
present certain commonality in distinguishing salient objects, a novel joint
learning and densely cooperative fusion (JL-DCF) architecture is designed to
learn from both RGB and depth inputs through a shared network backbone, known
as the Siamese architecture. In this paper, we propose two effective
components: joint learning (JL), and densely cooperative fusion (DCF). The JL
module provides robust saliency feature learning by exploiting cross-modal
commonality via a Siamese network, while the DCF module is introduced for
complementary feature discovery. Comprehensive experiments using five popular
metrics show that the designed framework yields a robust RGB-D saliency
detector with good generalization. As a result, JL-DCF significantly advances
the state-of-the-art models by an average of ~2.0% (max F-measure) across seven
challenging datasets. In addition, we show that JL-DCF is readily applicable to
other related multi-modal detection tasks, including RGB-T (thermal infrared)
SOD and video SOD, achieving comparable or even better performance against
state-of-the-art methods. We also link JL-DCF to the RGB-D semantic
segmentation field, showing its capability of outperforming several semantic
segmentation models on the task of RGB-D SOD. These facts further confirm that
the proposed framework could offer a potential solution for various
applications and provide more insight into the cross-modal complementarity
task.
Related papers
- HODINet: High-Order Discrepant Interaction Network for RGB-D Salient
Object Detection [4.007827908611563]
RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information.
Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features.
In this paper, we propose a high-order discrepant interaction network (HODINet) for RGB-D SOD.
arXiv Detail & Related papers (2023-07-03T11:56:21Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - A Unified Structure for Efficient RGB and RGB-D Salient Object Detection [15.715143016999695]
We propose a unified structure with a cross-attention context extraction (CRACE) module to address both tasks of SOD efficiently.
The proposed CRACE module receives and appropriately fuses two (for RGB SOD) or three (for RGB-D SOD) inputs.
The simple unified feature pyramid network (FPN)-like structure with CRACE modules conveys and refines the results under the multi-level supervisions of saliency and boundaries.
arXiv Detail & Related papers (2020-12-01T12:12:03Z) - Multi-level Cross-modal Interaction Network for RGB-D Salient Object
Detection [3.581367375462018]
We propose a novel Multi-level Cross-modal Interaction Network (MCINet) for RGB-D based salient object detection (SOD)
Our MCI-Net includes two key components: 1) a cross-modal feature learning network, which is used to learn the high-level features for the RGB images and depth cues, effectively enabling the correlations between the two sources to be exploited; and 2) a multi-level interactive integration network, which integrates multi-level cross-modal features to boost the SOD performance.
arXiv Detail & Related papers (2020-07-10T02:21:02Z) - Cross-Modal Weighting Network for RGB-D Salient Object Detection [76.0965123893641]
We propose a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD.
Specifically, three RGB-depth interaction modules, named CMW-L, CMW-M and CMW-H, are developed to deal with respectively low-, middle- and high-level cross-modal information fusion.
CMWNet consistently outperforms 15 state-of-the-art RGB-D SOD methods on seven popular benchmarks.
arXiv Detail & Related papers (2020-07-09T16:01:44Z) - JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for
RGB-D Salient Object Detection [39.125777418630136]
This paper proposes a novel joint learning and densely-cooperative fusion (JL-DCF) architecture for RGB-D salient object detection.
Our JL-DCF learns from both RGB and depth inputs through a Siamese network.
Experiments show that the designed framework yields a robust RGB-D saliency detector with good generalization.
arXiv Detail & Related papers (2020-04-18T03:22:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.