Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion
- URL: http://arxiv.org/abs/2410.15091v1
- Date: Sat, 19 Oct 2024 12:56:58 GMT
- Title: Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion
- Authors: Chaodong Xiao, Minghan Li, Zhengqiang Zhang, Deyu Meng, Lei Zhang,
- Abstract summary: Selective state space models (SSMs) excel at capturing long-range dependencies in 1D sequential data.
We propose Spatial-Mamba, a novel approach that establishes neighborhood connectivity directly in the state space.
We show that Spatial-Mamba, even with a single scan, attains or surpasses the state-of-the-art SSM-based models in image classification, detection and segmentation.
- Score: 46.82975707531064
- License:
- Abstract: Selective state space models (SSMs), such as Mamba, highly excel at capturing long-range dependencies in 1D sequential data, while their applications to 2D vision tasks still face challenges. Current visual SSMs often convert images into 1D sequences and employ various scanning patterns to incorporate local spatial dependencies. However, these methods are limited in effectively capturing the complex image spatial structures and the increased computational cost caused by the lengthened scanning paths. To address these limitations, we propose Spatial-Mamba, a novel approach that establishes neighborhood connectivity directly in the state space. Instead of relying solely on sequential state transitions, we introduce a structure-aware state fusion equation, which leverages dilated convolutions to capture image spatial structural dependencies, significantly enhancing the flow of visual contextual information. Spatial-Mamba proceeds in three stages: initial state computation in a unidirectional scan, spatial context acquisition through structure-aware state fusion, and final state computation using the observation equation. Our theoretical analysis shows that Spatial-Mamba unifies the original Mamba and linear attention under the same matrix multiplication framework, providing a deeper understanding of our method. Experimental results demonstrate that Spatial-Mamba, even with a single scan, attains or surpasses the state-of-the-art SSM-based models in image classification, detection and segmentation. Source codes and trained models can be found at $\href{https://github.com/EdwardChasel/Spatial-Mamba}{\text{this https URL}}$.
Related papers
- DAMamba: Vision State Space Model with Dynamic Adaptive Scan [51.81060691414399]
State space models (SSMs) have recently garnered significant attention in computer vision.
We propose Dynamic Adaptive Scan (DAS), a data-driven method that adaptively allocates scanning orders and regions.
Based on DAS, we propose the vision backbone DAMamba, which significantly outperforms current state-of-the-art vision Mamba models in vision tasks.
arXiv Detail & Related papers (2025-02-18T08:12:47Z) - Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging [40.80197280147993]
We propose a Mamba-inspired Joint Unfolding Network (MiJUN) to overcome the inherent nonlinear and ill-posed characteristics of HSI reconstruction.
We introduce an accelerated unfolding network scheme, which reduces the reliance on initial optimization stages.
We refine the scanning strategy with Mamba by integrating the tensor mode-$k$ unfolding into the Mamba network.
arXiv Detail & Related papers (2025-01-02T13:56:23Z) - STNMamba: Mamba-based Spatial-Temporal Normality Learning for Video Anomaly Detection [48.997518615379995]
Video anomaly detection (VAD) has been extensively researched due to its potential for intelligent video systems.
Most existing methods based on CNNs and transformers still suffer from substantial computational burdens.
We propose a lightweight and effective Mamba-based network named STNMamba to enhance the learning of spatial-temporal normality.
arXiv Detail & Related papers (2024-12-28T08:49:23Z) - Mamba2D: A Natively Multi-Dimensional State-Space Model for Vision Tasks [47.49096400786856]
State-Space Models (SSMs) have recently emerged as a powerful and efficient alternative to the long-standing transformer architecture.
We re-deriving modern selective state-space techniques, starting from a multidimensional formulation.
Mamba2D shows comparable performance to prior adaptations of SSMs for vision tasks, on standard image classification evaluations with the ImageNet-1K dataset.
arXiv Detail & Related papers (2024-12-20T18:50:36Z) - Image Forgery Localization with State Space Models [6.6222439382291]
We propose LoMa, a novel image forgery localization method that leverages the selective SSMs.
LoMa employs atrous selective scan to traverse the spatial domain and convert the tampered image into ordered patch sequences.
This is the first image forgery localization model constructed based on the SSM-based model.
arXiv Detail & Related papers (2024-12-15T15:10:53Z) - V2M: Visual 2-Dimensional Mamba for Image Representation Learning [68.51380287151927]
Mamba has garnered widespread attention due to its flexible design and efficient hardware performance to process 1D sequences.
Recent studies have attempted to apply Mamba to the visual domain by flattening 2D images into patches and then regarding them as a 1D sequence.
We propose a Visual 2-Dimensional Mamba model as a complete solution, which directly processes image tokens in the 2D space.
arXiv Detail & Related papers (2024-10-14T11:11:06Z) - SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising [13.1240990099267]
We introduce a memory-efficient spatial-spectralamba (SSUMamba) for HSI denoising.
Mamba is known for its remarkable long-range dependency modeling capabilities.
SSUMamba achieves superior denoising results with lower memory consumption per batch compared to transformer-based methods.
arXiv Detail & Related papers (2024-05-02T20:44:26Z) - S$^2$Mamba: A Spatial-spectral State Space Model for Hyperspectral Image Classification [44.99672241508994]
Land cover analysis using hyperspectral images (HSI) remains an open problem due to their low spatial resolution and complex spectral information.
We propose S$2$Mamba, a spatial-spectral state space model for hyperspectral image classification, to excavate spatial-spectral contextual features.
arXiv Detail & Related papers (2024-04-28T15:12:56Z) - VMamba: Visual State Space Model [98.0517369083152]
We adapt Mamba, a state-space language model, into VMamba, a vision backbone with linear time complexity.
At the core of VMamba is a stack of Visual State-Space (VSS) blocks with the 2D Selective Scan (SS2D) module.
arXiv Detail & Related papers (2024-01-18T17:55:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.