Related papers: Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning

Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning

URL: http://arxiv.org/abs/2406.16083v1
Date: Sun, 23 Jun 2024 11:28:08 GMT
Title: Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning
Authors: Ruisheng Gao, Zeyu Xiao, Zhiwei Xiong,
Abstract summary: Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution. However, their quadratic complexity hinders the efficient processing of high resolution 4D inputs. We propose a Mamba-based Light Field Super-Resolution method, named MLFSR, by designing an efficient subspace scanning strategy.
Score: 48.99361249764921
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution by effectively modeling long-range spatial-angular correlations, but their quadratic complexity hinders the efficient processing of high resolution 4D inputs, resulting in slow inference speed and high memory cost. As a compromise, most prior work adopts a patch-based strategy, which fails to leverage the full information from the entire input LFs. The recently proposed selective state-space model, Mamba, has gained popularity for its efficient long-range sequence modeling. In this paper, we propose a Mamba-based Light Field Super-Resolution method, named MLFSR, by designing an efficient subspace scanning strategy. Specifically, we tokenize 4D LFs into subspace sequences and conduct bi-directional scanning on each subspace. Based on our scanning strategy, we then design the Mamba-based Global Interaction (MGI) module to capture global information and the local Spatial- Angular Modulator (SAM) to complement local details. Additionally, we introduce a Transformer-to-Mamba (T2M) loss to further enhance overall performance. Extensive experiments on public benchmarks demonstrate that MLFSR surpasses CNN-based models and rivals Transformer-based methods in performance while maintaining higher efficiency. With quicker inference speed and reduced memory demand, MLFSR facilitates full-image processing of high-resolution 4D LFs with enhanced performance.

Related papers

High-resolution Photo Enhancement in Real-time: A Laplacian Pyramid Network [73.19214585791268]
This paper introduces a pyramid network called LLF-LUT++, which integrates global and local operators through closed-form Laplacian pyramid decomposition and reconstruction.<n>Specifically, we utilize an image-adaptive 3D LUT that capitalizes on the global tonal characteristics of downsampled images.<n>LLF-LUT++ not only achieves a 2.64 dB improvement in PSNR on the HDR+ dataset, but also further reduces, with 4K resolution images processed in just 13 ms on a single GPU.
arXiv Detail & Related papers (2025-10-13T16:52:32Z)
Exploring Non-Local Spatial-Angular Correlations with a Hybrid Mamba-Transformer Framework for Light Field Super-Resolution [68.54692184478462]
Mamba-based methods have shown great potential in optimizing both computational cost and performance of light field image super-resolution.<n>We propose a Subspace Simple Scanning (Sub-SS) strategy, based on which we design the Subspace Simple Mamba Block (SSMB) to achieve more efficient and precise feature extraction.<n>We also propose a dual-stage modeling strategy to address the limitation of state space in preserving spatial-angular and disparity information.
arXiv Detail & Related papers (2025-09-05T05:50:38Z)
MambaFusion: Height-Fidelity Dense Global Fusion for Multi-modal 3D Object Detection [45.792346999032496]
We present the first work demonstrating that a pure Mamba block can achieve efficient Dense Global Fusion.<n>Our motivation stems from the observation that existing fusion strategies are constrained by their inability to simultaneously achieve efficiency.<n>We propose height-fidelity LiDAR encoding that preserves precise height information through voxel compression in continuous space.
arXiv Detail & Related papers (2025-07-06T12:29:45Z)
MambaOutRS: A Hybrid CNN-Fourier Architecture for Remote Sensing Image Classification [4.14360329494344]
We introduce MambaOutRS, a novel hybrid convolutional architecture for remote sensing image classification.<n>MambaOutRS builds upon stacked Gated CNN blocks for local feature extraction and introduces a novel Fourier Filter Gate (FFG) module.
arXiv Detail & Related papers (2025-06-24T12:20:11Z)
$L^2$FMamba: Lightweight Light Field Image Super-Resolution with State Space Model [3.741194134589865]
Transformers bring significantly improved performance to the light field image super-resolution task due to their long-range dependency modeling capability. We introduce the LF-VSSM block, a novel module inspired by progressive feature extraction, to efficiently capture critical long-range spatial-angular dependencies in light field images. We propose a lightweight network, $L2$FMamba, which integrates the LF-VSSM block to leverage light field features for super-resolution tasks while overcoming the computational challenges of Transformer-based approaches.
arXiv Detail & Related papers (2025-03-25T01:24:52Z)
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs. We propose the MobileMamba framework, which balances efficiency and performance. MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods.
arXiv Detail & Related papers (2024-11-24T18:01:05Z)
LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba [54.85262314960038]
Local Attentional Mamba blocks capture both global contexts and local details with linear complexity. Our model exhibits exceptional scalability and surpasses the performance of DiT across various model scales on ImageNet at 256x256 resolution. Compared to state-of-the-art diffusion models on ImageNet 256x256 and 512x512, our largest model presents notable advantages, such as a reduction of up to 62% GFLOPs.
arXiv Detail & Related papers (2024-08-05T16:39:39Z)
GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model [66.35608254724566]
State-space models (SSMs) have showcased effective performance in modeling long-range dependencies with subquadratic complexity. However, pure SSM-based models still face challenges related to stability and achieving optimal performance on computer vision tasks. Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes.
arXiv Detail & Related papers (2024-07-18T17:59:58Z)
LFMamba: Light Field Image Super-Resolution with State Space Model [28.426889157353028]
We introduce an SSM-based network for light field image super-resolution termed LFMamba. Experimental results on LF benchmarks demonstrate the superior performance of LFMamba. We expect that our LFMamba shed light on effective representation learning of LFs with state space models.
arXiv Detail & Related papers (2024-06-18T10:13:19Z)
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis [56.849285913695184]
Diffusion Mamba (DiM) is a sequence model for efficient high-resolution image synthesis. DiM architecture achieves inference-time efficiency for high-resolution images. Experiments demonstrate the effectiveness and efficiency of our DiM.
arXiv Detail & Related papers (2024-05-23T06:53:18Z)
Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR. Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z)
MambaUIE&SR: Unraveling the Ocean's Secrets with Only 2.8 GFLOPs [1.7648680700685022]
Underwater Image Enhancement (UIE) techniques aim to address the problem of underwater image degradation due to light absorption and scattering. Recent years, both Convolution Neural Network (CNN)-based and Transformer-based methods have been widely explored. MambaUIE is able to efficiently synthesize global and local information and maintains a very small number of parameters with high accuracy.
arXiv Detail & Related papers (2024-04-22T05:12:11Z)
Memory-Efficient Optical Flow via Radius-Distribution Orthogonal Cost Volume [6.122542233250026]
We present MeFlow, a novel memory-efficient method for high-resolution optical flow estimation. Our method achieves competitive performance on both Sintel and KITTI benchmarks, while maintaining the highest memory efficiency on high-resolution inputs.
arXiv Detail & Related papers (2023-12-06T12:43:11Z)
Light Field Image Super-Resolution with Transformers [11.104338786168324]
CNN-based methods have achieved remarkable performance in LF image SR. We propose a simple but effective Transformer-based method for LF image SR. Our method achieves superior SR performance with a small model size and low computational cost.
arXiv Detail & Related papers (2021-08-17T12:58:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.