HODINet: High-Order Discrepant Interaction Network for RGB-D Salient
Object Detection
- URL: http://arxiv.org/abs/2307.00954v1
- Date: Mon, 3 Jul 2023 11:56:21 GMT
- Title: HODINet: High-Order Discrepant Interaction Network for RGB-D Salient
Object Detection
- Authors: Kang Yi, Jing Xu, Xiao Jin, Fu Guo, Yan-Feng Wu
- Abstract summary: RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information.
Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features.
In this paper, we propose a high-order discrepant interaction network (HODINet) for RGB-D SOD.
- Score: 4.007827908611563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: RGB-D salient object detection (SOD) aims to detect the prominent regions by
jointly modeling RGB and depth information. Most RGB-D SOD methods apply the
same type of backbones and fusion modules to identically learn the
multimodality and multistage features. However, these features contribute
differently to the final saliency results, which raises two issues: 1) how to
model discrepant characteristics of RGB images and depth maps; 2) how to fuse
these cross-modality features in different stages. In this paper, we propose a
high-order discrepant interaction network (HODINet) for RGB-D SOD. Concretely,
we first employ transformer-based and CNN-based architectures as backbones to
encode RGB and depth features, respectively. Then, the high-order
representations are delicately extracted and embedded into spatial and channel
attentions for cross-modality feature fusion in different stages. Specifically,
we design a high-order spatial fusion (HOSF) module and a high-order channel
fusion (HOCF) module to fuse features of the first two and the last two stages,
respectively. Besides, a cascaded pyramid reconstruction network is adopted to
progressively decode the fused features in a top-down pathway. Extensive
experiments are conducted on seven widely used datasets to demonstrate the
effectiveness of the proposed approach. We achieve competitive performance
against 24 state-of-the-art methods under four evaluation metrics.
Related papers
- Interactive Context-Aware Network for RGB-T Salient Object Detection [7.544240329265388]
We propose a novel network called Interactive Context-Aware Network (ICANet)
ICANet contains three modules that can effectively perform the cross-modal and cross-scale fusions.
Experiments prove that our network performs favorably against the state-of-the-art RGB-T SOD methods.
arXiv Detail & Related papers (2022-11-11T10:04:36Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Siamese Network for RGB-D Salient Object Detection and Beyond [113.30063105890041]
A novel framework is proposed to learn from both RGB and depth inputs through a shared network backbone.
Comprehensive experiments using five popular metrics show that the designed framework yields a robust RGB-D saliency detector.
We also link JL-DCF to the RGB-D semantic segmentation field, showing its capability of outperforming several semantic segmentation models.
arXiv Detail & Related papers (2020-08-26T06:01:05Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - RGB-D Salient Object Detection with Cross-Modality Modulation and
Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD)
The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.