SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection
with Depth Image Classification
- URL: http://arxiv.org/abs/2207.04224v1
- Date: Sat, 9 Jul 2022 08:22:12 GMT
- Title: SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection
with Depth Image Classification
- Authors: Xingzhao Jia and Dongye Changlei and Yanjun Peng
- Abstract summary: A novel RGB-D salient object detection model (SiaTrans) is proposed in this paper.
SiaTrans allows training on depth image quality classification at the same time as training on RGB-D saliency maps.
Experiments on nine RGB-D SOD benchmark datasets show that SiaTrans has the best overall performance and the least compared with recent state-of-the-art methods.
- Score: 2.578242050187029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RGB-D SOD uses depth information to handle challenging scenes and obtain
high-quality saliency maps. Existing state-of-the-art RGB-D saliency detection
methods overwhelmingly rely on the strategy of directly fusing depth
information. Although these methods improve the accuracy of saliency prediction
through various cross-modality fusion strategies, misinformation provided by
some poor-quality depth images can affect the saliency prediction result. To
address this issue, a novel RGB-D salient object detection model (SiaTrans) is
proposed in this paper, which allows training on depth image quality
classification at the same time as training on SOD. In light of the common
information between RGB and depth images on salient objects, SiaTrans uses a
Siamese transformer network with shared weight parameters as the encoder and
extracts RGB and depth features concatenated on the batch dimension, saving
space resources without compromising performance. SiaTrans uses the Class token
in the backbone network (T2T-ViT) to classify the quality of depth images
without preventing the token sequence from going on with the saliency detection
task. Transformer-based cross-modality fusion module (CMF) can effectively fuse
RGB and depth information. And in the testing process, CMF can choose to fuse
cross-modality information or enhance RGB information according to the quality
classification signal of the depth image. The greatest benefit of our designed
CMF and decoder is that they maintain the consistency of RGB and RGB-D
information decoding: SiaTrans decodes RGB-D or RGB information under the same
model parameters according to the classification signal during testing.
Comprehensive experiments on nine RGB-D SOD benchmark datasets show that
SiaTrans has the best overall performance and the least computation compared
with recent state-of-the-art methods.
Related papers
- Attentive Multimodal Fusion for Optical and Scene Flow [24.08052492109655]
Existing methods typically rely solely on RGB images or fuse the modalities at later stages.
We propose a novel deep neural network approach named FusionRAFT, which enables early-stage information fusion between sensor modalities.
Our approach exhibits improved robustness in the presence of noise and low-lighting conditions that affect the RGB images.
arXiv Detail & Related papers (2023-07-28T04:36:07Z) - Symmetric Uncertainty-Aware Feature Transmission for Depth
Super-Resolution [52.582632746409665]
We propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR.
Our method achieves superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-01T06:35:59Z) - Pyramidal Attention for Saliency Detection [30.554118525502115]
This paper exploits only RGB images, estimates depth from RGB, and leverages the intermediate depth features.
We employ a pyramidal attention structure to extract multi-level convolutional-transformer features to process initial stage representations.
We report significantly improved performance against 21 and 40 state-of-the-art SOD methods on eight RGB and RGB-D datasets.
arXiv Detail & Related papers (2022-04-14T06:57:46Z) - Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images [89.81919625224103]
Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images.
We present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection.
arXiv Detail & Related papers (2022-01-01T03:02:27Z) - MTFNet: Mutual-Transformer Fusion Network for RGB-D Salient Object
Detection [15.371153771528093]
We propose a novel Mutual-Transformer Fusion Network (MTFNet) for RGB-D SOD.
MTFNet contains two main modules, $i.e.$, Focal Feature Extractor (FFE) and Mutual-Transformer Fusion (MTF)
Comprehensive experimental results on six public benchmarks demonstrate the superiority of our proposed MTFNet.
arXiv Detail & Related papers (2021-12-02T12:48:37Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient
Object Detection [73.31632581915201]
We propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction.
A newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D.
arXiv Detail & Related papers (2020-08-07T10:13:05Z) - Cross-Modal Weighting Network for RGB-D Salient Object Detection [76.0965123893641]
We propose a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD.
Specifically, three RGB-depth interaction modules, named CMW-L, CMW-M and CMW-H, are developed to deal with respectively low-, middle- and high-level cross-modal information fusion.
CMWNet consistently outperforms 15 state-of-the-art RGB-D SOD methods on seven popular benchmarks.
arXiv Detail & Related papers (2020-07-09T16:01:44Z) - Is Depth Really Necessary for Salient Object Detection? [50.10888549190576]
We make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference.
Not only surpasses the state-of-the-art performances on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin.
arXiv Detail & Related papers (2020-05-30T13:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.