Related papers: Guided Depth Map Super-Resolution via Multi-Scale Fusion U-shaped Mamba Network

Guided Depth Map Super-Resolution via Multi-Scale Fusion U-shaped Mamba Network

URL: http://arxiv.org/abs/2508.00248v1
Date: Fri, 01 Aug 2025 01:24:34 GMT
Title: Guided Depth Map Super-Resolution via Multi-Scale Fusion U-shaped Mamba Network
Authors: Chenggang Guo, Hao Xu, XianMing Wan,
Abstract summary: Traditional convolutional neural network has limitations in dealing with long-range dependencies.<n>We propose a multi-scale fusion U-shaped Mamba model, a novel guided depth map super-resolution framework.<n>The proposed MSF-UM significantly reduces the number of model parameters while achieving better reconstruction accuracy.
Score: 4.545298205355719
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Depth map super-resolution technology aims to improve the spatial resolution of low-resolution depth maps and effectively restore high-frequency detail information. Traditional convolutional neural network has limitations in dealing with long-range dependencies and are unable to fully model the global contextual information in depth maps. Although transformer can model global dependencies, its computational complexity and memory consumption are quadratic, which significantly limits its ability to process high-resolution depth maps. In this paper, we propose a multi-scale fusion U-shaped Mamba (MSF-UM) model, a novel guided depth map super-resolution framework. The core innovation of this model is to integrate Mamba's efficient state-space modeling capabilities into a multi-scale U-shaped fusion structure guided by a color image. The structure combining the residual dense channel attention block and the Mamba state space module is designed, which combines the local feature extraction capability of the convolutional layer with the modeling advantage of the state space model for long-distance dependencies. At the same time, the model adopts a multi-scale cross-modal fusion strategy to make full use of the high-frequency texture information from the color image to guide the super-resolution process of the depth map. Compared with existing mainstream methods, the proposed MSF-UM significantly reduces the number of model parameters while achieving better reconstruction accuracy. Extensive experiments on multiple publicly available datasets validate the effectiveness of the model, especially showing excellent generalization ability in the task of large-scale depth map super-resolution.

Related papers

Towards Lightweight Hyperspectral Image Super-Resolution with Depthwise Separable Dilated Convolutional Network [6.5149222591754725]
We introduce a lightweight depthwise separable dilated convolutional network (DSDCN) to address the challenges of hyperspectral image super-resolution.<n>We propose a custom loss function that combines mean squared error (MSE), an L2 norm regularization-based constraint, and a spectral angle-based loss.<n>The proposed model achieves very competitive performance on two publicly available hyperspectral datasets.
arXiv Detail & Related papers (2025-05-01T07:57:23Z)
Global Semantic-Guided Sub-image Feature Weight Allocation in High-Resolution Large Vision-Language Models [50.98559225639266]
Sub-images with higher semantic relevance to the entire image encapsulate richer visual information for preserving the model's visual understanding ability.<n>Global Semantic-guided Weight Allocator (GSWA) module allocates weights to sub-images based on their relative information density.<n>SleighVL, a lightweight yet high-performing model, outperforms models with comparable parameters and remains competitive with larger models.
arXiv Detail & Related papers (2025-01-24T06:42:06Z)
Multi-dimensional Visual Prompt Enhanced Image Restoration via Mamba-Transformer Aggregation [4.227991281224256]
This paper proposes to fully utilize complementary advantages from Mamba and Transformer without sacrificing computation efficiency.<n>The selective scanning mechanism of Mamba is employed to focus on spatial modeling, enabling capture long-range spatial dependencies.<n>The self-attention mechanism of Transformer is applied to focus on channel modeling, avoiding high burdens that are in quadratic growth with image's spatial dimensions.
arXiv Detail & Related papers (2024-12-20T12:36:34Z)
Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement. Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet) Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z)
DSR-Diff: Depth Map Super-Resolution with Diffusion Model [38.68563026759223]
We present a novel CDSR paradigm that utilizes a diffusion model within the latent space to generate guidance for depth map super-resolution. Our proposed method has shown superior performance in extensive experiments when compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-11-16T14:18:10Z)
Multi-resolution Monocular Depth Map Fusion by Self-supervised Gradient-based Composition [14.246972408737987]
We propose a novel depth map fusion module to combine the advantages of estimations with multi-resolution inputs. Our lightweight depth fusion is one-shot and runs in real-time, making our method 80X faster than a state-of-the-art depth fusion method.
arXiv Detail & Related papers (2022-12-03T05:13:50Z)
DepthFormer: Exploiting Long-Range Correlation and Local Information for Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation. We propose to leverage the Transformer to model this global context with an effective attention mechanism. Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z)
High-resolution Depth Maps Imaging via Attention-based Hierarchical Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR. We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z)
Accurate and Lightweight Image Super-Resolution with Model-Guided Deep Unfolding Network [63.69237156340457]
We present and advocate an explainable approach toward SISR named model-guided deep unfolding network (MoG-DUN) MoG-DUN is accurate (producing fewer aliasing artifacts), computationally efficient (with reduced model parameters), and versatile (capable of handling multiple degradations) The superiority of the proposed MoG-DUN method to existing state-of-theart image methods including RCAN, SRDNF, and SRFBN is substantiated by extensive experiments on several popular datasets and various degradation scenarios.
arXiv Detail & Related papers (2020-09-14T08:23:37Z)
Multi-Scale Boosted Dehazing Network with Dense Feature Fusion [92.92572594942071]
We propose a Multi-Scale Boosted Dehazing Network with Dense Feature Fusion based on the U-Net architecture. We show that the proposed model performs favorably against the state-of-the-art approaches on the benchmark datasets as well as real-world hazy images.
arXiv Detail & Related papers (2020-04-28T09:34:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.