Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection
- URL: http://arxiv.org/abs/2206.03105v1
- Date: Tue, 7 Jun 2022 08:35:41 GMT
- Title: Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection
- Authors: Chao Zeng and Sam Kwong
- Abstract summary: In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
- Score: 67.33924278729903
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Salient Object Detection is the task of predicting the human attended region
in a given scene. Fusing depth information has been proven effective in this
task. The main challenge of this problem is how to aggregate the complementary
information from RGB modality and depth modality. However, conventional deep
models heavily rely on CNN feature extractors, and the long-range contextual
dependencies are usually ignored. In this work, we propose Dual
Swin-Transformer based Mutual Interactive Network. We adopt Swin-Transformer as
the feature extractor for both RGB and depth modality to model the long-range
dependencies in visual inputs. Before fusing the two branches of features into
one, attention-based modules are applied to enhance features from each
modality. We design a self-attention-based cross-modality interaction module
and a gated modality attention module to leverage the complementary information
between the two modalities. For the saliency decoding, we create different
stages enhanced with dense connections and keep a decoding memory while the
multi-level encoding features are considered simultaneously. Considering the
inaccurate depth map issue, we collect the RGB features of early stages into a
skip convolution module to give more guidance from RGB modality to the final
saliency prediction. In addition, we add edge supervision to regularize the
feature learning process. Comprehensive experiments on five standard RGB-D SOD
benchmark datasets over four evaluation metrics demonstrate the superiority of
the proposed DTMINet method.
Related papers
- HODINet: High-Order Discrepant Interaction Network for RGB-D Salient
Object Detection [4.007827908611563]
RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information.
Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features.
In this paper, we propose a high-order discrepant interaction network (HODINet) for RGB-D SOD.
arXiv Detail & Related papers (2023-07-03T11:56:21Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - RGB-D Salient Object Detection with Cross-Modality Modulation and
Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD)
The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.