Spherical Space Feature Decomposition for Guided Depth Map
Super-Resolution
- URL: http://arxiv.org/abs/2303.08942v2
- Date: Wed, 23 Aug 2023 04:12:26 GMT
- Title: Spherical Space Feature Decomposition for Guided Depth Map
Super-Resolution
- Authors: Zixiang Zhao, Jiangshe Zhang, Xiang Gu, Chengli Tan, Shuang Xu, Yulun
Zhang, Radu Timofte, Luc Van Gool
- Abstract summary: Guided depth map super-resolution (GDSR) aims to upsample low-resolution (LR) depth maps with additional information involved in high-resolution (HR) RGB images from the same scene.
In this paper, we propose the Spherical Space feature Decomposition Network (SSDNet) to solve the above issues.
Our method can achieve state-of-the-art results on four test datasets, as well as successfully generalize to real-world scenes.
- Score: 123.04455334124188
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Guided depth map super-resolution (GDSR), as a hot topic in multi-modal image
processing, aims to upsample low-resolution (LR) depth maps with additional
information involved in high-resolution (HR) RGB images from the same scene.
The critical step of this task is to effectively extract domain-shared and
domain-private RGB/depth features. In addition, three detailed issues, namely
blurry edges, noisy surfaces, and over-transferred RGB texture, need to be
addressed. In this paper, we propose the Spherical Space feature Decomposition
Network (SSDNet) to solve the above issues. To better model cross-modality
features, Restormer block-based RGB/depth encoders are employed for extracting
local-global features. Then, the extracted features are mapped to the spherical
space to complete the separation of private features and the alignment of
shared features. Shared features of RGB are fused with the depth features to
complete the GDSR task. Subsequently, a spherical contrast refinement (SCR)
module is proposed to further address the detail issues. Patches that are
classified according to imperfect categories are input into the SCR module,
where the patch features are pulled closer to the ground truth and pushed away
from the corresponding imperfect samples in the spherical feature space via
contrastive learning. Extensive experiments demonstrate that our method can
achieve state-of-the-art results on four test datasets, as well as successfully
generalize to real-world scenes. The code is available at
\url{https://github.com/Zhaozixiang1228/GDSR-SSDNet}.
Related papers
- The Devil is in the Details: Boosting Guided Depth Super-Resolution via
Rethinking Cross-Modal Alignment and Aggregation [41.12790340577986]
Guided depth super-resolution (GDSR) involves restoring missing depth details using the high-resolution RGB image of the same scene.
Previous approaches have struggled with the heterogeneity and complementarity of the multi-modal inputs, and neglected the issues of modal misalignment, geometrical misalignment, and feature selection.
arXiv Detail & Related papers (2024-01-16T05:37:08Z) - Symmetric Uncertainty-Aware Feature Transmission for Depth
Super-Resolution [52.582632746409665]
We propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR.
Our method achieves superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-01T06:35:59Z) - Pyramidal Attention for Saliency Detection [30.554118525502115]
This paper exploits only RGB images, estimates depth from RGB, and leverages the intermediate depth features.
We employ a pyramidal attention structure to extract multi-level convolutional-transformer features to process initial stage representations.
We report significantly improved performance against 21 and 40 state-of-the-art SOD methods on eight RGB and RGB-D datasets.
arXiv Detail & Related papers (2022-04-14T06:57:46Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Discrete Cosine Transform Network for Guided Depth Map Super-Resolution [19.86463937632802]
The goal is to use high-resolution (HR) RGB images to provide extra information on edges and object contours, so that low-resolution depth maps can be upsampled to HR ones.
We propose an advanced Discrete Cosine Transform Network (DCTNet), which is composed of four components.
We show that our method can generate accurate and HR depth maps, surpassing state-of-the-art methods.
arXiv Detail & Related papers (2021-04-14T17:01:03Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - Accurate RGB-D Salient Object Detection via Collaborative Learning [101.82654054191443]
RGB-D saliency detection shows impressive ability on some challenge scenarios.
We propose a novel collaborative learning framework where edge, depth and saliency are leveraged in a more efficient way.
arXiv Detail & Related papers (2020-07-23T04:33:36Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Fast Generation of High Fidelity RGB-D Images by Deep-Learning with
Adaptive Convolution [10.085742605397124]
We propose a deep-learning based approach to efficiently generate RGB-D images with completed information in high resolution.
As an end-to-end approach, high fidelity RGB-D images can be generated efficiently at the rate of around 21 frames per second.
arXiv Detail & Related papers (2020-02-12T16:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.