BTS-Net: Bi-directional Transfer-and-Selection Network For RGB-D Salient
Object Detection
- URL: http://arxiv.org/abs/2104.01784v1
- Date: Mon, 5 Apr 2021 05:58:43 GMT
- Title: BTS-Net: Bi-directional Transfer-and-Selection Network For RGB-D Salient
Object Detection
- Authors: Wenbo Zhang, Yao Jiang, Keren Fu, Qijun Zhao
- Abstract summary: depth maps obtained from RGB-D salient object detection often suffer from low quality and inaccuracy.
Most existing RGB-D SOD models have no cross-modal interactions or only have unidirectional interactions from depth to RGB in their encoder stages.
We propose a novel bi-directional transfer-and-selection network named BTS-Net, which adopts a set of bi-directional transfer-and-selection modules to purify features during encoding.
- Score: 16.87553302005972
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth information has been proved beneficial in RGB-D salient object
detection (SOD). However, depth maps obtained often suffer from low quality and
inaccuracy. Most existing RGB-D SOD models have no cross-modal interactions or
only have unidirectional interactions from depth to RGB in their encoder
stages, which may lead to inaccurate encoder features when facing low quality
depth. To address this limitation, we propose to conduct progressive
bi-directional interactions as early in the encoder stage, yielding a novel
bi-directional transfer-and-selection network named BTS-Net, which adopts a set
of bi-directional transfer-and-selection (BTS) modules to purify features
during encoding. Based on the resulting robust encoder features, we also design
an effective light-weight group decoder to achieve accurate final saliency
prediction. Comprehensive experiments on six widely used datasets demonstrate
that BTS-Net surpasses 16 latest state-of-the-art approaches in terms of four
key metrics.
Related papers
- CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - RGB-D Salient Object Detection with Ubiquitous Target Awareness [37.6726410843724]
We make the first attempt to solve the RGB-D salient object detection problem with a novel depth-awareness framework.
We propose a Ubiquitous Target Awareness (UTA) network to solve three important challenges in RGB-D SOD task.
Our proposed UTA network is depth-free for inference and runs in real-time with 43 FPS.
arXiv Detail & Related papers (2021-09-08T04:27:29Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient
Object Detection [73.31632581915201]
We propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction.
A newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D.
arXiv Detail & Related papers (2020-08-07T10:13:05Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Is Depth Really Necessary for Salient Object Detection? [50.10888549190576]
We make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference.
Not only surpasses the state-of-the-art performances on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin.
arXiv Detail & Related papers (2020-05-30T13:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.