Hierarchical Dynamic Filtering Network for RGB-D Salient Object
Detection
- URL: http://arxiv.org/abs/2007.06227v3
- Date: Thu, 16 Jul 2020 09:15:49 GMT
- Title: Hierarchical Dynamic Filtering Network for RGB-D Salient Object
Detection
- Authors: Youwei Pang, Lihe Zhang, Xiaoqi Zhao, Huchuan Lu
- Abstract summary: The main purpose of RGB-D salient object detection (SOD) is how to better integrate and utilize cross-modal fusion information.
In this paper, we explore these issues from a new perspective.
We implement a kind of more flexible and efficient multi-scale cross-modal feature processing.
- Score: 91.43066633305662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The main purpose of RGB-D salient object detection (SOD) is how to better
integrate and utilize cross-modal fusion information. In this paper, we explore
these issues from a new perspective. We integrate the features of different
modalities through densely connected structures and use their mixed features to
generate dynamic filters with receptive fields of different sizes. In the end,
we implement a kind of more flexible and efficient multi-scale cross-modal
feature processing, i.e. dynamic dilated pyramid module. In order to make the
predictions have sharper edges and consistent saliency regions, we design a
hybrid enhanced loss function to further optimize the results. This loss
function is also validated to be effective in the single-modal RGB SOD task. In
terms of six metrics, the proposed method outperforms the existing twelve
methods on eight challenging benchmark datasets. A large number of experiments
verify the effectiveness of the proposed module and loss function. Our code,
model and results are available at \url{https://github.com/lartpang/HDFNet}.
Related papers
- CasDyF-Net: Image Dehazing via Cascaded Dynamic Filters [0.0]
Image dehazing aims to restore image clarity and visual quality by reducing atmospheric scattering and absorption effects.
Inspired by dynamic filtering, we propose using cascaded dynamic filters to create a multi-branch network.
Experiments on RESIDE, Haze4K, and O-Haze datasets validate our method's effectiveness.
arXiv Detail & Related papers (2024-09-13T03:20:38Z) - Decomposed Guided Dynamic Filters for Efficient RGB-Guided Depth
Completion [46.04264366475848]
RGB-guided depth completion aims at predicting dense depth maps from sparse depth measurements and corresponding RGB images.
Guided dynamic filters generate spatially-variant depth-wise separable convolutional filters from RGB features to guide depth features.
We propose to decompose the guided dynamic filters into a spatially-shared component multiplied by content-adaptive adaptors at each spatial location.
arXiv Detail & Related papers (2023-09-05T08:37:58Z) - Interactive Context-Aware Network for RGB-T Salient Object Detection [7.544240329265388]
We propose a novel network called Interactive Context-Aware Network (ICANet)
ICANet contains three modules that can effectively perform the cross-modal and cross-scale fusions.
Experiments prove that our network performs favorably against the state-of-the-art RGB-T SOD methods.
arXiv Detail & Related papers (2022-11-11T10:04:36Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - Fine-Grained Dynamic Head for Object Detection [68.70628757217939]
We propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance.
Experiments demonstrate the effectiveness and efficiency of the proposed method on several state-of-the-art detection benchmarks.
arXiv Detail & Related papers (2020-12-07T08:16:32Z) - Siamese Network for RGB-D Salient Object Detection and Beyond [113.30063105890041]
A novel framework is proposed to learn from both RGB and depth inputs through a shared network backbone.
Comprehensive experiments using five popular metrics show that the designed framework yields a robust RGB-D saliency detector.
We also link JL-DCF to the RGB-D semantic segmentation field, showing its capability of outperforming several semantic segmentation models.
arXiv Detail & Related papers (2020-08-26T06:01:05Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - RGB-D Salient Object Detection with Cross-Modality Modulation and
Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD)
The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z) - Multi-level Cross-modal Interaction Network for RGB-D Salient Object
Detection [3.581367375462018]
We propose a novel Multi-level Cross-modal Interaction Network (MCINet) for RGB-D based salient object detection (SOD)
Our MCI-Net includes two key components: 1) a cross-modal feature learning network, which is used to learn the high-level features for the RGB images and depth cues, effectively enabling the correlations between the two sources to be exploited; and 2) a multi-level interactive integration network, which integrates multi-level cross-modal features to boost the SOD performance.
arXiv Detail & Related papers (2020-07-10T02:21:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.