HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness
- URL: http://arxiv.org/abs/2301.07405v1
- Date: Wed, 18 Jan 2023 10:00:59 GMT
- Title: HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness
- Authors: Zongwei Wu, Guillaume Allibert, Fabrice Meriaudeau, Chao Ma, and
C\'edric Demonceaux
- Abstract summary: We propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection.
Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies.
Our HiDAnet performs favorably over the state-of-the-art methods by large margins.
- Score: 2.341385717236931
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: RGB-D saliency detection aims to fuse multi-modal cues to accurately localize
salient regions. Existing works often adopt attention modules for feature
modeling, with few methods explicitly leveraging fine-grained details to merge
with semantic cues. Thus, despite the auxiliary depth information, it is still
challenging for existing models to distinguish objects with similar appearances
but at distinct camera distances. In this paper, from a new perspective, we
propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D
saliency detection. Our motivation comes from the observation that the
multi-granularity properties of geometric priors correlate well with the neural
network hierarchies. To realize multi-modal and multi-level fusion, we first
use a granularity-based attention scheme to strengthen the discriminatory power
of RGB and depth features separately. Then we introduce a unified cross
dual-attention module for multi-modal and multi-level fusion in a
coarse-to-fine manner. The encoded multi-modal features are gradually
aggregated into a shared decoder. Further, we exploit a multi-scale loss to
take full advantage of the hierarchical information. Extensive experiments on
challenging benchmark datasets demonstrate that our HiDAnet performs favorably
over the state-of-the-art methods by large margins.
Related papers
- AMANet: Advancing SAR Ship Detection with Adaptive Multi-Hierarchical
Attention Network [0.5437298646956507]
A novel adaptive multi-hierarchical attention module (AMAM) is proposed to learn multi-scale features and adaptively aggregate salient features from various feature layers.
We first fuse information from adjacent feature layers to enhance the detection of smaller targets, thereby achieving multi-scale feature enhancement.
Thirdly, we present a novel adaptive multi-hierarchical attention network (AMANet) by embedding the AMAM between the backbone network and the feature pyramid network.
arXiv Detail & Related papers (2024-01-24T03:56:33Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Weakly Aligned Feature Fusion for Multimodal Object Detection [52.15436349488198]
multimodal data often suffer from the position shift problem, i.e., the image pair is not strictly aligned.
This problem makes it difficult to fuse multimodal features and puzzles the convolutional neural network (CNN) training.
In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem.
arXiv Detail & Related papers (2022-04-21T02:35:23Z) - Multi-Scale Iterative Refinement Network for RGB-D Salient Object
Detection [7.062058947498447]
salient visual cues appear in various scales and resolutions of RGB images due to semantic gaps at different feature levels.
Similar salient patterns are available in cross-modal depth images as well as multi-scale versions.
We devise attention based fusion module (ABF) to address on cross-modal correlation.
arXiv Detail & Related papers (2022-01-24T10:33:00Z) - M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient
Object Detection [1.002712867721496]
Methods based on RGB-D often suffer from the incompatibility of multi-modal feature fusion and the insufficiency of multi-scale feature aggregation.
We propose a novel multi-modal and multi-scale refined network (M2RNet)
Three essential components are presented in this network.
arXiv Detail & Related papers (2021-09-16T12:15:40Z) - RGB-D Saliency Detection via Cascaded Mutual Information Minimization [122.8879596830581]
Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning.
We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
arXiv Detail & Related papers (2021-09-15T12:31:27Z) - Specificity-preserving RGB-D Saliency Detection [103.3722116992476]
We propose a specificity-preserving network (SP-Net) for RGB-D saliency detection.
Two modality-specific networks and a shared learning network are adopted to generate individual and shared saliency maps.
Experiments on six benchmark datasets demonstrate that our SP-Net outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-08-18T14:14:22Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - Bifurcated backbone strategy for RGB-D salient object detection [168.19708737906618]
We leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel cascaded refinement network.
Our architecture, named Bifurcated Backbone Strategy Network (BBS-Net), is simple, efficient, and backbone-independent.
arXiv Detail & Related papers (2020-07-06T13:01:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.