RGB-D Salient Object Detection with Ubiquitous Target Awareness
- URL: http://arxiv.org/abs/2109.03425v1
- Date: Wed, 8 Sep 2021 04:27:29 GMT
- Title: RGB-D Salient Object Detection with Ubiquitous Target Awareness
- Authors: Yifan Zhao, Jiawei Zhao, Jia Li, Xiaowu Chen
- Abstract summary: We make the first attempt to solve the RGB-D salient object detection problem with a novel depth-awareness framework.
We propose a Ubiquitous Target Awareness (UTA) network to solve three important challenges in RGB-D SOD task.
Our proposed UTA network is depth-free for inference and runs in real-time with 43 FPS.
- Score: 37.6726410843724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional RGB-D salient object detection methods aim to leverage depth as
complementary information to find the salient regions in both modalities.
However, the salient object detection results heavily rely on the quality of
captured depth data which sometimes are unavailable. In this work, we make the
first attempt to solve the RGB-D salient object detection problem with a novel
depth-awareness framework. This framework only relies on RGB data in the
testing phase, utilizing captured depth data as supervision for representation
learning. To construct our framework as well as achieving accurate salient
detection results, we propose a Ubiquitous Target Awareness (UTA) network to
solve three important challenges in RGB-D SOD task: 1) a depth awareness module
to excavate depth information and to mine ambiguous regions via adaptive
depth-error weights, 2) a spatial-aware cross-modal interaction and a
channel-aware cross-level interaction, exploiting the low-level boundary cues
and amplifying high-level salient channels, and 3) a gated multi-scale
predictor module to perceive the object saliency in different contextual
scales. Besides its high performance, our proposed UTA network is depth-free
for inference and runs in real-time with 43 FPS. Experimental evidence
demonstrates that our proposed network not only surpasses the state-of-the-art
methods on five public RGB-D SOD benchmarks by a large margin, but also
verifies its extensibility on five public RGB SOD benchmarks.
Related papers
- RGB-D Grasp Detection via Depth Guided Learning with Cross-modal
Attention [14.790193023912973]
This paper proposes a novel learning based approach to RGB-D grasp detection, namely Depth Guided Cross-modal Attention Network (DGCAN)
To better leverage the geometry information recorded in the depth channel, a complete 6-dimensional rectangle representation is adopted with the grasp depth dedicatedly considered.
The prediction of the extra grasp depth substantially strengthens feature learning, thereby leading to more accurate results.
arXiv Detail & Related papers (2023-02-28T02:41:27Z) - Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer [53.413305467674434]
We introduce open-source RGB data to support spike depth estimation, leveraging its annotations and spatial information.
We propose a cross-modality cross-domain (BiCross) framework to realize unsupervised spike depth estimation.
Our method achieves state-of-the-art (SOTA) performances, compared with RGB-oriented unsupervised depth estimation methods.
arXiv Detail & Related papers (2022-08-26T09:35:20Z) - Robust RGB-D Fusion for Saliency Detection [13.705088021517568]
We propose a robust RGB-D fusion method that benefits from layer-wise and trident spatial, attention mechanisms.
Our experiments on five benchmark datasets demonstrate that the proposed fusion method performs consistently better than the state-of-the-art fusion alternatives.
arXiv Detail & Related papers (2022-08-02T21:23:00Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Accurate RGB-D Salient Object Detection via Collaborative Learning [101.82654054191443]
RGB-D saliency detection shows impressive ability on some challenge scenarios.
We propose a novel collaborative learning framework where edge, depth and saliency are leveraged in a more efficient way.
arXiv Detail & Related papers (2020-07-23T04:33:36Z) - Is Depth Really Necessary for Salient Object Detection? [50.10888549190576]
We make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference.
Not only surpasses the state-of-the-art performances on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin.
arXiv Detail & Related papers (2020-05-30T13:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.