A Unified Structure for Efficient RGB and RGB-D Salient Object Detection
- URL: http://arxiv.org/abs/2012.00437v1
- Date: Tue, 1 Dec 2020 12:12:03 GMT
- Title: A Unified Structure for Efficient RGB and RGB-D Salient Object Detection
- Authors: Peng Peng, Yong-Jie Li
- Abstract summary: We propose a unified structure with a cross-attention context extraction (CRACE) module to address both tasks of SOD efficiently.
The proposed CRACE module receives and appropriately fuses two (for RGB SOD) or three (for RGB-D SOD) inputs.
The simple unified feature pyramid network (FPN)-like structure with CRACE modules conveys and refines the results under the multi-level supervisions of saliency and boundaries.
- Score: 15.715143016999695
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Salient object detection (SOD) has been well studied in recent years,
especially using deep neural networks. However, SOD with RGB and RGB-D images
is usually treated as two different tasks with different network structures
that need to be designed specifically. In this paper, we proposed a unified and
efficient structure with a cross-attention context extraction (CRACE) module to
address both tasks of SOD efficiently. The proposed CRACE module receives and
appropriately fuses two (for RGB SOD) or three (for RGB-D SOD) inputs. The
simple unified feature pyramid network (FPN)-like structure with CRACE modules
conveys and refines the results under the multi-level supervisions of saliency
and boundaries. The proposed structure is simple yet effective; the rich
context information of RGB and depth can be appropriately extracted and fused
by the proposed structure efficiently. Experimental results show that our
method outperforms other state-of-the-art methods in both RGB and RGB-D SOD
tasks on various datasets and in terms of most metrics.
Related papers
- HODINet: High-Order Discrepant Interaction Network for RGB-D Salient
Object Detection [4.007827908611563]
RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information.
Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features.
In this paper, we propose a high-order discrepant interaction network (HODINet) for RGB-D SOD.
arXiv Detail & Related papers (2023-07-03T11:56:21Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Learning Scene Structure Guidance via Cross-Task Knowledge Transfer for
Single Depth Super-Resolution [35.21324004883027]
Existing color-guided depth super-resolution (DSR) approaches require paired RGB-D data as training samples where the RGB image is used as structural guidance to recover the degraded depth map due to their geometrical similarity.
We explore for the first time to learn the cross-modality knowledge at training stage, where both RGB and depth modalities are available, but test on the target dataset, where only single depth modality exists.
Specifically, we construct an auxiliary depth estimation (DE) task that takes an RGB image as input to estimate a depth map, and train both DSR task and DE task collaboratively to boost the performance of
arXiv Detail & Related papers (2021-03-24T03:08:25Z) - Siamese Network for RGB-D Salient Object Detection and Beyond [113.30063105890041]
A novel framework is proposed to learn from both RGB and depth inputs through a shared network backbone.
Comprehensive experiments using five popular metrics show that the designed framework yields a robust RGB-D saliency detector.
We also link JL-DCF to the RGB-D semantic segmentation field, showing its capability of outperforming several semantic segmentation models.
arXiv Detail & Related papers (2020-08-26T06:01:05Z) - Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient
Object Detection [73.31632581915201]
We propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction.
A newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D.
arXiv Detail & Related papers (2020-08-07T10:13:05Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Cross-Modal Weighting Network for RGB-D Salient Object Detection [76.0965123893641]
We propose a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD.
Specifically, three RGB-depth interaction modules, named CMW-L, CMW-M and CMW-H, are developed to deal with respectively low-, middle- and high-level cross-modal information fusion.
CMWNet consistently outperforms 15 state-of-the-art RGB-D SOD methods on seven popular benchmarks.
arXiv Detail & Related papers (2020-07-09T16:01:44Z) - Synergistic saliency and depth prediction for RGB-D saliency detection [76.27406945671379]
Existing RGB-D saliency datasets are small, which may lead to overfitting and limited generalization for diverse scenarios.
We propose a semi-supervised system for RGB-D saliency detection that can be trained on smaller RGB-D saliency datasets without saliency ground truth.
arXiv Detail & Related papers (2020-07-03T14:24:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.