Related papers: Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution

Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution

URL: http://arxiv.org/abs/2303.08942v2
Date: Wed, 23 Aug 2023 04:12:26 GMT
Title: Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution
Authors: Zixiang Zhao, Jiangshe Zhang, Xiang Gu, Chengli Tan, Shuang Xu, Yulun Zhang, Radu Timofte, Luc Van Gool
Abstract summary: Guided depth map super-resolution (GDSR) aims to upsample low-resolution (LR) depth maps with additional information involved in high-resolution (HR) RGB images from the same scene. In this paper, we propose the Spherical Space feature Decomposition Network (SSDNet) to solve the above issues. Our method can achieve state-of-the-art results on four test datasets, as well as successfully generalize to real-world scenes.
Score: 123.04455334124188
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Guided depth map super-resolution (GDSR), as a hot topic in multi-modal image processing, aims to upsample low-resolution (LR) depth maps with additional information involved in high-resolution (HR) RGB images from the same scene. The critical step of this task is to effectively extract domain-shared and domain-private RGB/depth features. In addition, three detailed issues, namely blurry edges, noisy surfaces, and over-transferred RGB texture, need to be addressed. In this paper, we propose the Spherical Space feature Decomposition Network (SSDNet) to solve the above issues. To better model cross-modality features, Restormer block-based RGB/depth encoders are employed for extracting local-global features. Then, the extracted features are mapped to the spherical space to complete the separation of private features and the alignment of shared features. Shared features of RGB are fused with the depth features to complete the GDSR task. Subsequently, a spherical contrast refinement (SCR) module is proposed to further address the detail issues. Patches that are classified according to imperfect categories are input into the SCR module, where the patch features are pulled closer to the ground truth and pushed away from the corresponding imperfect samples in the spherical feature space via contrastive learning. Extensive experiments demonstrate that our method can achieve state-of-the-art results on four test datasets, as well as successfully generalize to real-world scenes. The code is available at \url{https://github.com/Zhaozixiang1228/GDSR-SSDNet}.

Related papers

HDBFormer: Efficient RGB-D Semantic Segmentation with A Heterogeneous Dual-Branch Framework [0.0]
In RGB-D semantic segmentation for indoor scenes, a key challenge is effectively integrating the rich color information from RGB images with the spatial distance information from depth images. We propose a novel heterogeneous dual-branch framework called HDBFormer, specifically designed to handle these modality differences. For RGB images, which contain rich detail, we employ both a basic and detail encoder to extract local and global features. For the simpler depth images, we propose LDFormer, a lightweight hierarchical encoder that efficiently extracts depth features with fewer parameters.
arXiv Detail & Related papers (2025-04-18T09:29:46Z)
IGAF: Incremental Guided Attention Fusion for Depth Super-Resolution [13.04760414998408]
We propose a novel sensor fusion methodology for guided depth super-resolution (GDSR) GDSR combines LR depth maps with HR images to estimate detailed HR depth maps. Our model achieves state-of-the-art results compared to all baseline models on the NYU v2 dataset.
arXiv Detail & Related papers (2025-01-03T09:27:51Z)
The Devil is in the Details: Boosting Guided Depth Super-Resolution via Rethinking Cross-Modal Alignment and Aggregation [41.12790340577986]
Guided depth super-resolution (GDSR) involves restoring missing depth details using the high-resolution RGB image of the same scene. Previous approaches have struggled with the heterogeneity and complementarity of the multi-modal inputs, and neglected the issues of modal misalignment, geometrical misalignment, and feature selection.
arXiv Detail & Related papers (2024-01-16T05:37:08Z)
Symmetric Uncertainty-Aware Feature Transmission for Depth Super-Resolution [52.582632746409665]
We propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR. Our method achieves superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-01T06:35:59Z)
Pyramidal Attention for Saliency Detection [30.554118525502115]
This paper exploits only RGB images, estimates depth from RGB, and leverages the intermediate depth features. We employ a pyramidal attention structure to extract multi-level convolutional-transformer features to process initial stage representations. We report significantly improved performance against 21 and 40 state-of-the-art SOD methods on eight RGB and RGB-D datasets.
arXiv Detail & Related papers (2022-04-14T06:57:46Z)
Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD. Two components are designed to implement the effective cross-modality interaction. Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z)
Discrete Cosine Transform Network for Guided Depth Map Super-Resolution [19.86463937632802]
The goal is to use high-resolution (HR) RGB images to provide extra information on edges and object contours, so that low-resolution depth maps can be upsampled to HR ones. We propose an advanced Discrete Cosine Transform Network (DCTNet), which is composed of four components. We show that our method can generate accurate and HR depth maps, surpassing state-of-the-art methods.
arXiv Detail & Related papers (2021-04-14T17:01:03Z)
High-resolution Depth Maps Imaging via Attention-based Hierarchical Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR. We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z)
Accurate RGB-D Salient Object Detection via Collaborative Learning [101.82654054191443]
RGB-D saliency detection shows impressive ability on some challenge scenarios. We propose a novel collaborative learning framework where edge, depth and saliency are leveraged in a more efficient way.
arXiv Detail & Related papers (2020-07-23T04:33:36Z)
Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion. In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
Fast Generation of High Fidelity RGB-D Images by Deep-Learning with Adaptive Convolution [10.085742605397124]
We propose a deep-learning based approach to efficiently generate RGB-D images with completed information in high resolution. As an end-to-end approach, high fidelity RGB-D images can be generated efficiently at the rate of around 21 frames per second.
arXiv Detail & Related papers (2020-02-12T16:14:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.