Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction
- URL: http://arxiv.org/abs/2203.04895v1
- Date: Wed, 9 Mar 2022 17:20:18 GMT
- Title: Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction
- Authors: Xiaoqi Zhao, Youwei Pang, Lihe Zhang and Huchuan Lu
- Abstract summary: We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
- Score: 91.43066633305662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Benefiting from color independence, illumination invariance and location
discrimination attributed by the depth map, it can provide important
supplemental information for extracting salient objects in complex
environments. However, high-quality depth sensors are expensive and can not be
widely applied. While general depth sensors produce the noisy and sparse depth
information, which brings the depth-based networks with irreversible
interference. In this paper, we propose a novel multi-task and multi-modal
filtered transformer (MMFT) network for RGB-D salient object detection (SOD).
Specifically, we unify three complementary tasks: depth estimation, salient
object detection and contour estimation. The multi-task mechanism promotes the
model to learn the task-aware features from the auxiliary tasks. In this way,
the depth information can be completed and purified. Moreover, we introduce a
multi-modal filtered transformer (MFT) module, which equips with three
modality-specific filters to generate the transformer-enhanced feature for each
modality. The proposed model works in a depth-free style during the testing
phase. Experiments show that it not only significantly surpasses the
depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts
a high-quality depth map and salient contour at the same time. And, the
resulted depth map can help existing RGB-D SOD methods obtain significant
performance gain.
Related papers
- OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection [102.0744303467713]
We propose a new multi-view 3D object detector named OPEN.
Our main idea is to effectively inject object-wise depth information into the network through our proposed object-wise position embedding.
OPEN achieves a new state-of-the-art performance with 64.4% NDS and 56.7% mAP on the nuScenes test benchmark.
arXiv Detail & Related papers (2024-07-15T14:29:15Z) - Depth-discriminative Metric Learning for Monocular 3D Object Detection [14.554132525651868]
We introduce a novel metric learning scheme that encourages the model to extract depth-discriminative features regardless of the visual attributes.
Our method consistently improves the performance of various baselines by 23.51% and 5.78% on average.
arXiv Detail & Related papers (2024-01-02T07:34:09Z) - RBF Weighted Hyper-Involution for RGB-D Object Detection [0.0]
We propose a real-time and two stream RGBD object detection model.
The proposed model consists of two new components: a depth guided hyper-involution that adapts dynamically based on the spatial interaction pattern in the raw depth map and an up-sampling based trainable fusion layer.
We show that the proposed model outperforms other RGB-D based object detection models on NYU Depth v2 dataset and achieves comparable (second best) results on SUN RGB-D.
arXiv Detail & Related papers (2023-09-30T11:25:34Z) - Symmetric Uncertainty-Aware Feature Transmission for Depth
Super-Resolution [52.582632746409665]
We propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR.
Our method achieves superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-01T06:35:59Z) - RGB-D Grasp Detection via Depth Guided Learning with Cross-modal
Attention [14.790193023912973]
This paper proposes a novel learning based approach to RGB-D grasp detection, namely Depth Guided Cross-modal Attention Network (DGCAN)
To better leverage the geometry information recorded in the depth channel, a complete 6-dimensional rectangle representation is adopted with the grasp depth dedicatedly considered.
The prediction of the extra grasp depth substantially strengthens feature learning, thereby leading to more accurate results.
arXiv Detail & Related papers (2023-02-28T02:41:27Z) - Depth-Cooperated Trimodal Network for Video Salient Object Detection [13.727763221832532]
We propose a depth-operated triOD network called DCTNet for video salient object detection (VS)
To this end, we first generate depth from RGB frames, and then propose an approach to treat the three modalities unequally.
We also introduce a refinement fusion module (RFM) to suppress noises in each modality and select useful information dynamically for further feature refinement.
arXiv Detail & Related papers (2022-02-12T13:04:16Z) - BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and
Monocular Depth Estimation [60.34562823470874]
We propose a joint learning network of depth map super-resolution (DSR) and monocular depth estimation (MDE) without introducing additional supervision labels.
One is the high-frequency attention bridge (HABdg) designed for the feature encoding process, which learns the high-frequency information of the MDE task to guide the DSR task.
The other is the content guidance bridge (CGBdg) designed for the depth map reconstruction process, which provides the content guidance learned from DSR task for MDE task.
arXiv Detail & Related papers (2021-07-27T01:28:23Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z) - Deep RGB-D Saliency Detection with Depth-Sensitive Attention and
Automatic Multi-Modal Fusion [15.033234579900657]
RGB-D salient object detection (SOD) is usually formulated as a problem of classification or regression over two modalities, i.e., RGB and depth.
We propose a depth-sensitive RGB feature modeling scheme using the depth-wise geometric prior of salient objects.
Experiments on seven standard benchmarks demonstrate the effectiveness of the proposed approach against the state-of-the-art.
arXiv Detail & Related papers (2021-03-22T13:28:45Z) - A Single Stream Network for Robust and Real-time RGB-D Salient Object
Detection [89.88222217065858]
We design a single stream network to use the depth map to guide early fusion and middle fusion between RGB and depth.
This model is 55.5% lighter than the current lightest model and runs at a real-time speed of 32 FPS when processing a $384 times 384$ image.
arXiv Detail & Related papers (2020-07-14T04:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.