RGB-D Salient Object Detection via 3D Convolutional Neural Networks
- URL: http://arxiv.org/abs/2101.10241v1
- Date: Mon, 25 Jan 2021 17:03:02 GMT
- Title: RGB-D Salient Object Detection via 3D Convolutional Neural Networks
- Authors: Qian Chen, Ze Liu, Yi Zhang, Keren Fu, Qijun Zhao, Hongwei Du
- Abstract summary: We make the first attempt in addressing RGB-D SOD through 3D convolutional neural networks.
The proposed model, named RD3D, aims at pre-fusion in the encoder stage and in-depth fusion in the decoder stage.
We show that RD3D performs favorably against 14 state-of-the-art RGB-D SOD approaches in terms of four key evaluation metrics.
- Score: 19.20231385522917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RGB-D salient object detection (SOD) recently has attracted increasing
research interest and many deep learning methods based on encoder-decoder
architectures have emerged. However, most existing RGB-D SOD models conduct
feature fusion either in the single encoder or the decoder stage, which hardly
guarantees sufficient cross-modal fusion ability. In this paper, we make the
first attempt in addressing RGB-D SOD through 3D convolutional neural networks.
The proposed model, named RD3D, aims at pre-fusion in the encoder stage and
in-depth fusion in the decoder stage to effectively promote the full
integration of RGB and depth streams. Specifically, RD3D first conducts
pre-fusion across RGB and depth modalities through an inflated 3D encoder, and
later provides in-depth feature fusion by designing a 3D decoder equipped with
rich back-projection paths (RBPP) for leveraging the extensive aggregation
ability of 3D convolutions. With such a progressive fusion strategy involving
both the encoder and decoder, effective and thorough interaction between the
two modalities can be exploited and boost the detection accuracy. Extensive
experiments on six widely used benchmark datasets demonstrate that RD3D
performs favorably against 14 state-of-the-art RGB-D SOD approaches in terms of
four key evaluation metrics. Our code will be made publicly available:
https://github.com/PPOLYpubki/RD3D.
Related papers
- Salient Object Detection in RGB-D Videos [11.805682025734551]
This paper makes two primary contributions: the dataset and the model.
We construct the RDVS dataset, a new RGB-D VSOD dataset with realistic depth.
We introduce DCTNet+, a three-stream network tailored for RGB-D VSOD.
arXiv Detail & Related papers (2023-10-24T03:18:07Z) - DFormer: Rethinking RGBD Representation Learning for Semantic
Segmentation [76.81628995237058]
DFormer is a novel framework to learn transferable representations for RGB-D segmentation tasks.
It pretrains the backbone using image-depth pairs from ImageNet-1K.
DFormer achieves new state-of-the-art performance on two popular RGB-D tasks.
arXiv Detail & Related papers (2023-09-18T11:09:11Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Pyramidal Attention for Saliency Detection [30.554118525502115]
This paper exploits only RGB images, estimates depth from RGB, and leverages the intermediate depth features.
We employ a pyramidal attention structure to extract multi-level convolutional-transformer features to process initial stage representations.
We report significantly improved performance against 21 and 40 state-of-the-art SOD methods on eight RGB and RGB-D datasets.
arXiv Detail & Related papers (2022-04-14T06:57:46Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient
Object Detection [73.31632581915201]
We propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction.
A newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D.
arXiv Detail & Related papers (2020-08-07T10:13:05Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Is Depth Really Necessary for Salient Object Detection? [50.10888549190576]
We make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference.
Not only surpasses the state-of-the-art performances on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin.
arXiv Detail & Related papers (2020-05-30T13:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.