RGB-D Saliency Detection via Cascaded Mutual Information Minimization
- URL: http://arxiv.org/abs/2109.07246v1
- Date: Wed, 15 Sep 2021 12:31:27 GMT
- Title: RGB-D Saliency Detection via Cascaded Mutual Information Minimization
- Authors: Jing Zhang and Deng-Ping Fan and Yuchao Dai and Xin Yu and Yiran Zhong
and Nick Barnes and Ling Shao
- Abstract summary: Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning.
We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
- Score: 122.8879596830581
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing RGB-D saliency detection models do not explicitly encourage RGB and
depth to achieve effective multi-modal learning. In this paper, we introduce a
novel multi-stage cascaded learning framework via mutual information
minimization to "explicitly" model the multi-modal information between RGB
image and depth data. Specifically, we first map the feature of each mode to a
lower dimensional feature vector, and adopt mutual information minimization as
a regularizer to reduce the redundancy between appearance features from RGB and
geometric features from depth. We then perform multi-stage cascaded learning to
impose the mutual information minimization constraint at every stage of the
network. Extensive experiments on benchmark RGB-D saliency datasets illustrate
the effectiveness of our framework. Further, to prosper the development of this
field, we contribute the largest (7x larger than NJU2K) dataset, which contains
15,625 image pairs with high quality
polygon-/scribble-/object-/instance-/rank-level annotations. Based on these
rich labels, we additionally construct four new benchmarks with strong
baselines and observe some interesting phenomena, which can motivate future
model design. Source code and dataset are available at
"https://github.com/JingZhang617/cascaded_rgbd_sod".
Related papers
- PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised
RGB-D Point Cloud Registration [6.030097207369754]
We propose a network implementing multi-scale bidirectional fusion between RGB images and point clouds generated from depth images.
Our method achieves new state-of-the-art performance.
arXiv Detail & Related papers (2023-08-09T08:13:46Z) - HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness [2.341385717236931]
We propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection.
Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies.
Our HiDAnet performs favorably over the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2023-01-18T10:00:59Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Bifurcated backbone strategy for RGB-D salient object detection [168.19708737906618]
We leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel cascaded refinement network.
Our architecture, named Bifurcated Backbone Strategy Network (BBS-Net), is simple, efficient, and backbone-independent.
arXiv Detail & Related papers (2020-07-06T13:01:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.