MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration
- URL: http://arxiv.org/abs/2412.20066v1
- Date: Sat, 28 Dec 2024 07:40:39 GMT
- Title: MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration
- Authors: Boyun Li, Haiyu Zhao, Wenxin Wang, Peng Hu, Yuanbiao Gou, Xi Peng,
- Abstract summary: We propose a novel Mamba-based Image Restoration model (MaIR)
MaIR consists of Nested S-shaped Scanning strategy (NSS) and Sequence Shuffle Attention block (SSA)
Thanks to NSS and SSA, MaIR surpasses 40 baselines across 14 challenging datasets.
- Score: 24.66368406718623
- License:
- Abstract: Recent advancements in Mamba have shown promising results in image restoration. These methods typically flatten 2D images into multiple distinct 1D sequences along rows and columns, process each sequence independently using selective scan operation, and recombine them to form the outputs. However, such a paradigm overlooks two vital aspects: i) the local relationships and spatial continuity inherent in natural images, and ii) the discrepancies among sequences unfolded through totally different ways. To overcome the drawbacks, we explore two problems in Mamba-based restoration methods: i) how to design a scanning strategy preserving both locality and continuity while facilitating restoration, and ii) how to aggregate the distinct sequences unfolded in totally different ways. To address these problems, we propose a novel Mamba-based Image Restoration model (MaIR), which consists of Nested S-shaped Scanning strategy (NSS) and Sequence Shuffle Attention block (SSA). Specifically, NSS preserves locality and continuity of the input images through the stripe-based scanning region and the S-shaped scanning path, respectively. SSA aggregates sequences through calculating attention weights within the corresponding channels of different sequences. Thanks to NSS and SSA, MaIR surpasses 40 baselines across 14 challenging datasets, achieving state-of-the-art performance on the tasks of image super-resolution, denoising, deblurring and dehazing. Our codes will be available after acceptance.
Related papers
- UD-Mamba: A pixel-level uncertainty-driven Mamba model for medical image segmentation [28.423422060841137]
Uncertainty-Driven Mamba (UD-Mamba) redefines the pixel-order scanning process by incorporating channel uncertainty into the scanning mechanism.
UD-Mamba introduces two key scanning techniques: 1) sequential scanning, which prioritizes regions with high uncertainty by scanning in a row-by-row fashion, and 2) skip scanning, which processes columns vertically, moving from high-to-low or low-to-high uncertainty at fixed intervals.
Our method demonstrates robust segmentation performance, validated across three distinct medical imaging datasets involving pathology, dermatological lesions, and cardiac tasks.
arXiv Detail & Related papers (2025-02-04T05:20:33Z) - Parallel Sequence Modeling via Generalized Spatial Propagation Network [80.66202109995726]
Generalized Spatial Propagation Network (GSPN) is a new attention mechanism for optimized vision tasks that inherently captures 2D spatial structures.
GSPN overcomes limitations by directly operating on spatially coherent image data and forming dense pairwise connections through a line-scan approach.
GSPN achieves superior spatial fidelity and state-of-the-art performance in vision tasks, including ImageNet classification, class-guided image generation, and text-to-image generation.
arXiv Detail & Related papers (2025-01-21T18:56:19Z) - Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging [40.80197280147993]
We propose a Mamba-inspired Joint Unfolding Network (MiJUN) to overcome the inherent nonlinear and ill-posed characteristics of HSI reconstruction.
We introduce an accelerated unfolding network scheme, which reduces the reliance on initial optimization stages.
We refine the scanning strategy with Mamba by integrating the tensor mode-$k$ unfolding into the Mamba network.
arXiv Detail & Related papers (2025-01-02T13:56:23Z) - Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion [46.82975707531064]
Selective state space models (SSMs) excel at capturing long-range dependencies in 1D sequential data.
We propose Spatial-Mamba, a novel approach that establishes neighborhood connectivity directly in the state space.
We show that Spatial-Mamba, even with a single scan, attains or surpasses the state-of-the-art SSM-based models in image classification, detection and segmentation.
arXiv Detail & Related papers (2024-10-19T12:56:58Z) - V2M: Visual 2-Dimensional Mamba for Image Representation Learning [68.51380287151927]
Mamba has garnered widespread attention due to its flexible design and efficient hardware performance to process 1D sequences.
Recent studies have attempted to apply Mamba to the visual domain by flattening 2D images into patches and then regarding them as a 1D sequence.
We propose a Visual 2-Dimensional Mamba model as a complete solution, which directly processes image tokens in the 2D space.
arXiv Detail & Related papers (2024-10-14T11:11:06Z) - GlobalMamba: Global Image Serialization for Vision Mamba [73.50475621164037]
Vision mambas have demonstrated strong performance with linear complexity to the number of vision tokens.
Most existing methods employ patch-based image tokenization and then flatten them into 1D sequences for causal processing.
We propose a global image serialization method to transform the image into a sequence of causal tokens.
arXiv Detail & Related papers (2024-10-14T09:19:05Z) - MambaCSR: Dual-Interleaved Scanning for Compressed Image Super-Resolution With SSMs [14.42424591513825]
MambaCSR is a framework based on Mamba for the challenging compressed image super-resolution (CSR) task.
We propose an efficient dual-interleaved scanning paradigm (DIS) for CSR, which is composed of two scanning strategies.
Results on multiple benchmarks have shown the great performance of our MambaCSR in the compressed image super-resolution task.
arXiv Detail & Related papers (2024-08-21T16:30:45Z) - Comprehensive Generative Replay for Task-Incremental Segmentation with Concurrent Appearance and Semantic Forgetting [49.87694319431288]
Generalist segmentation models are increasingly favored for diverse tasks involving various objects from different image sources.
We propose a Comprehensive Generative (CGR) framework that restores appearance and semantic knowledge by synthesizing image-mask pairs.
Experiments on incremental tasks (cardiac, fundus and prostate segmentation) show its clear advantage for alleviating concurrent appearance and semantic forgetting.
arXiv Detail & Related papers (2024-06-28T10:05:58Z) - Matching in the Wild: Learning Anatomical Embeddings for Multi-Modality
Images [28.221419419614183]
Radiotherapists require accurate registration of MR/CT images to effectively use information from both modalities.
Recent learning-based methods have shown promising results in the rigid/affine step.
We propose a new approach called Cross-SAM to enable cross-modality matching.
arXiv Detail & Related papers (2023-07-07T11:49:06Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.