SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient
object detection
- URL: http://arxiv.org/abs/2204.05585v1
- Date: Tue, 12 Apr 2022 07:37:39 GMT
- Title: SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient
object detection
- Authors: Zhengyi Liu, Yacheng Tan, Qian He, Yun Xiao
- Abstract summary: We propose a cross-modality fusion model SwinNet for RGB-D and RGB-T salient object detection.
The proposed model outperforms the state-of-the-art models on RGB-D and RGB-T datasets.
- Score: 12.126413875108993
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Convolutional neural networks (CNNs) are good at extracting contexture
features within certain receptive fields, while transformers can model the
global long-range dependency features. By absorbing the advantage of
transformer and the merit of CNN, Swin Transformer shows strong feature
representation ability. Based on it, we propose a cross-modality fusion model
SwinNet for RGB-D and RGB-T salient object detection. It is driven by Swin
Transformer to extract the hierarchical features, boosted by attention
mechanism to bridge the gap between two modalities, and guided by edge
information to sharp the contour of salient object. To be specific, two-stream
Swin Transformer encoder first extracts multi-modality features, and then
spatial alignment and channel re-calibration module is presented to optimize
intra-level cross-modality features. To clarify the fuzzy boundary, edge-guided
decoder achieves inter-level cross-modality fusion under the guidance of edge
features. The proposed model outperforms the state-of-the-art models on RGB-D
and RGB-T datasets, showing that it provides more insight into the
cross-modality complementarity task.
Related papers
- Point-aware Interaction and CNN-induced Refinement Network for RGB-D
Salient Object Detection [95.84616822805664]
We introduce CNNs-assisted Transformer architecture and propose a novel RGB-D SOD network with Point-aware Interaction and CNN-induced Refinement.
In order to alleviate the block effect and detail destruction problems brought by the Transformer naturally, we design a CNN-induced refinement (CNNR) unit for content refinement and supplementation.
arXiv Detail & Related papers (2023-08-17T11:57:49Z) - TANet: Transformer-based Asymmetric Network for RGB-D Salient Object
Detection [13.126051625000605]
RGB-D SOD methods mainly rely on a symmetric two-stream CNN-based network to extract RGB and depth channel features separately.
We propose a Transformer-based asymmetric network (TANet) to tackle the issues mentioned above.
Our method achieves superior performance over 14 state-of-the-art RGB-D methods on six public datasets.
arXiv Detail & Related papers (2022-07-04T03:06:59Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Transformer-Guided Convolutional Neural Network for Cross-View
Geolocalization [20.435023745201878]
We propose a novel Transformer-guided convolutional neural network (TransGCNN) architecture.
Our TransGCNN consists of a CNN backbone extracting feature map from an input image and a Transformer head modeling global context.
Experiments on popular benchmark datasets demonstrate that our model achieves top-1 accuracy of 94.12% and 84.92% on CVUSA and CVACT_val, respectively.
arXiv Detail & Related papers (2022-04-21T08:46:41Z) - TransCMD: Cross-Modal Decoder Equipped with Transformer for RGB-D
Salient Object Detection [86.94578023985677]
In this work, we rethink this task from the perspective of global information alignment and transformation.
Specifically, the proposed method (TransCMD) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path.
Experimental results on seven RGB-D SOD benchmark datasets demonstrate that a simple two-stream encoder-decoder framework can surpass the state-of-the-art purely CNN-based methods.
arXiv Detail & Related papers (2021-12-04T15:45:34Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.