Learning Scene Structure Guidance via Cross-Task Knowledge Transfer for
Single Depth Super-Resolution
- URL: http://arxiv.org/abs/2103.12955v1
- Date: Wed, 24 Mar 2021 03:08:25 GMT
- Title: Learning Scene Structure Guidance via Cross-Task Knowledge Transfer for
Single Depth Super-Resolution
- Authors: Baoli Sun, Xinchen Ye, Baopu Li, Haojie Li, Zhihui Wang, Rui Xu
- Abstract summary: Existing color-guided depth super-resolution (DSR) approaches require paired RGB-D data as training samples where the RGB image is used as structural guidance to recover the degraded depth map due to their geometrical similarity.
We explore for the first time to learn the cross-modality knowledge at training stage, where both RGB and depth modalities are available, but test on the target dataset, where only single depth modality exists.
Specifically, we construct an auxiliary depth estimation (DE) task that takes an RGB image as input to estimate a depth map, and train both DSR task and DE task collaboratively to boost the performance of
- Score: 35.21324004883027
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing color-guided depth super-resolution (DSR) approaches require paired
RGB-D data as training samples where the RGB image is used as structural
guidance to recover the degraded depth map due to their geometrical similarity.
However, the paired data may be limited or expensive to be collected in actual
testing environment. Therefore, we explore for the first time to learn the
cross-modality knowledge at training stage, where both RGB and depth modalities
are available, but test on the target dataset, where only single depth modality
exists. Our key idea is to distill the knowledge of scene structural guidance
from RGB modality to the single DSR task without changing its network
architecture. Specifically, we construct an auxiliary depth estimation (DE)
task that takes an RGB image as input to estimate a depth map, and train both
DSR task and DE task collaboratively to boost the performance of DSR. Upon
this, a cross-task interaction module is proposed to realize bilateral cross
task knowledge transfer. First, we design a cross-task distillation scheme that
encourages DSR and DE networks to learn from each other in a teacher-student
role-exchanging fashion. Then, we advance a structure prediction (SP) task that
provides extra structure regularization to help both DSR and DE networks learn
more informative structure representations for depth recovery. Extensive
experiments demonstrate that our scheme achieves superior performance in
comparison with other DSR methods.
Related papers
- DistillGrasp: Integrating Features Correlation with Knowledge Distillation for Depth Completion of Transparent Objects [4.939414800373192]
RGB-D cameras cannot accurately capture the depth of transparent objects.
Recent studies tend to explore new visual features and design complex networks to reconstruct the depth.
We propose an efficient depth completion network named DistillGrasp which distillates knowledge from the teacher branch to the student branch.
arXiv Detail & Related papers (2024-08-01T07:17:10Z) - Depth-Guided Semi-Supervised Instance Segmentation [62.80063539262021]
Semi-Supervised Instance (SSIS) aims to leverage an amount of unlabeled data during training.
Previous frameworks primarily utilized the RGB information of unlabeled images to generate pseudo-labels.
We introduce a Depth-Guided (DG) framework to overcome this limitation.
arXiv Detail & Related papers (2024-06-25T09:36:50Z) - 360$^\circ$ High-Resolution Depth Estimation via Uncertainty-aware Structural Knowledge Transfer [8.988255747467333]
To predict high-resolution (HR) omnidirectional depth map, existing methods typically leverage HR omnidirectional image (ODI) as the input via fully-supervised learning.
In this paper, we explore for the first time to estimate the HR omnidirectional depth directly from a low-resolution (LR) ODI, when no HR depth GT map is available.
Our key idea is to transfer the scene structural knowledge from the HR image modality and the corresponding LR depth maps to achieve the goal of HR depth estimation without any extra inference cost.
arXiv Detail & Related papers (2023-04-17T03:24:21Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and
Monocular Depth Estimation [60.34562823470874]
We propose a joint learning network of depth map super-resolution (DSR) and monocular depth estimation (MDE) without introducing additional supervision labels.
One is the high-frequency attention bridge (HABdg) designed for the feature encoding process, which learns the high-frequency information of the MDE task to guide the DSR task.
The other is the content guidance bridge (CGBdg) designed for the depth map reconstruction process, which provides the content guidance learned from DSR task for MDE task.
arXiv Detail & Related papers (2021-07-27T01:28:23Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - A Unified Structure for Efficient RGB and RGB-D Salient Object Detection [15.715143016999695]
We propose a unified structure with a cross-attention context extraction (CRACE) module to address both tasks of SOD efficiently.
The proposed CRACE module receives and appropriately fuses two (for RGB SOD) or three (for RGB-D SOD) inputs.
The simple unified feature pyramid network (FPN)-like structure with CRACE modules conveys and refines the results under the multi-level supervisions of saliency and boundaries.
arXiv Detail & Related papers (2020-12-01T12:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.