Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
- URL: http://arxiv.org/abs/2505.00630v1
- Date: Thu, 01 May 2025 16:07:51 GMT
- Title: Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
- Authors: Muyi Bao, Shuchang Lyu, Zhaoyang Xu, Huiyu Zhou, Jinchang Ren, Shiming Xiang, Xiangtai Li, Guangliang Cheng,
- Abstract summary: State Space Models (SSMs) have emerged as a paradigm-shifting solution, combining linear computational scaling with global context modeling.<n>This survey presents a comprehensive review of Mamba-based methodologies in remote sensing.
- Score: 46.65330450810048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning has profoundly transformed remote sensing, yet prevailing architectures like Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) remain constrained by critical trade-offs: CNNs suffer from limited receptive fields, while ViTs grapple with quadratic computational complexity, hindering their scalability for high-resolution remote sensing data. State Space Models (SSMs), particularly the recently proposed Mamba architecture, have emerged as a paradigm-shifting solution, combining linear computational scaling with global context modeling. This survey presents a comprehensive review of Mamba-based methodologies in remote sensing, systematically analyzing about 120 studies to construct a holistic taxonomy of innovations and applications. Our contributions are structured across five dimensions: (i) foundational principles of vision Mamba architectures, (ii) micro-architectural advancements such as adaptive scan strategies and hybrid SSM formulations, (iii) macro-architectural integrations, including CNN-Transformer-Mamba hybrids and frequency-domain adaptations, (iv) rigorous benchmarking against state-of-the-art methods in multiple application tasks, such as object detection, semantic segmentation, change detection, etc. and (v) critical analysis of unresolved challenges with actionable future directions. By bridging the gap between SSM theory and remote sensing practice, this survey establishes Mamba as a transformative framework for remote sensing analysis. To our knowledge, this paper is the first systematic review of Mamba architectures in remote sensing. Our work provides a structured foundation for advancing research in remote sensing systems through SSM-based methods. We curate an open-source repository (https://github.com/BaoBao0926/Awesome-Mamba-in-Remote-Sensing) to foster community-driven advancements.
Related papers
- RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing [28.488986896516284]
RoMA is a framework that enables scalable self-supervised pretraining of RS foundation models using large-scale, diverse, unlabeled data.<n>RoMA enhances scalability for high-resolution images through a tailored auto-regressive learning strategy.<n> experiments across scene classification, object detection, and semantic segmentation tasks demonstrate that RoMA-pretrained Mamba models consistently outperform ViT-based counterparts in both accuracy and computational efficiency.
arXiv Detail & Related papers (2025-03-13T14:09:18Z) - 2DMCG:2DMambawith Change Flow Guidance for Change Detection in Remote Sensing [4.18306618346671]
This paper proposes an efficient framework based on a Vision Mamba variant that enhances its ability to capture 2D spatial information.<n>The framework employs a 2DMamba encoder to effectively learn global contextual spatial information from multi-temporal images.<n>Experiments on benchmark datasets demonstrate the superior performance of our framework compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-03-01T14:55:13Z) - TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba [88.31117598044725]
We explore cross-architecture training to transfer the ready knowledge in existing Transformer models to alternative architecture Mamba, termed TransMamba.<n>Our approach employs a two-stage strategy to expedite training new Mamba models, ensuring effectiveness in across uni-modal and cross-modal tasks.<n>For cross-modal learning, we propose a cross-Mamba module that integrates language awareness into Mamba's visual features, enhancing the cross-modal interaction capabilities of Mamba architecture.
arXiv Detail & Related papers (2025-02-21T01:22:01Z) - A Survey on Mamba Architecture for Vision Applications [7.216568558372857]
Mamba architecture addresses scalability challenges in visual tasks.<n>Vision Mamba and VideoMamba introduce bidirectional scanning, selective mechanisms, andtemporal processing to enhance image and video understanding.<n>These advancements position Mamba as a promising architecture in computer vision research and applications.
arXiv Detail & Related papers (2025-02-11T00:59:30Z) - Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision.<n>In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z) - From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing [16.755590790629153]
This review examines the development and application of multi-modal language models (MLLMs) in remote sensing.
We focus on their ability to interpret and describe satellite imagery using natural language.
Key applications such as scene description, object detection, change detection, text-to-image retrieval, image-to-text generation, and visual question answering are discussed.
arXiv Detail & Related papers (2024-11-05T12:14:22Z) - Foundation Models for Remote Sensing and Earth Observation: A Survey [101.77425018347557]
This survey systematically reviews the emerging field of Remote Sensing Foundation Models (RSFMs)
It begins with an outline of their motivation and background, followed by an introduction of their foundational concepts.
We benchmark these models against publicly available datasets, discuss existing challenges, and propose future research directions.
arXiv Detail & Related papers (2024-10-22T01:08:21Z) - PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation [1.5136939451642137]
This paper proposes a novel network called Pyramid Pooling Mamba (PPMamba), which integrates CNN and Mamba for semantic segmentation tasks.
PPMamba achieves competitive performance compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-10T08:08:50Z) - MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking [51.28485682954006]
We propose a pure Mamba-based framework (MambaVT) to fully exploit intrinsic-temporal contextual modeling for robust visible-thermal tracking.
Specifically, we devise the long-range cross-frame integration component to globally adapt to target appearance variations.
Experiments show the significant potential of vision Mamba for RGB-T tracking, with MambaVT achieving state-of-the-art performance on four mainstream benchmarks.
arXiv Detail & Related papers (2024-08-15T02:29:00Z) - Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution.
To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR.
Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.