Related papers: Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study

Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study

URL: http://arxiv.org/abs/2405.08493v1
Date: Tue, 14 May 2024 10:36:56 GMT
Title: Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study
Authors: Qinfeng Zhu, Yuan Fang, Yuanzhi Cai, Cheng Chen, Lei Fan,
Abstract summary: We investigate the impact of mainstream scanning directions and their combinations on semantic segmentation of images. A simple, single scanning direction is deemed sufficient for semantic segmentation of high-resolution remotely sensed images.
Score: 7.334290421966221
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning methods, especially Convolutional Neural Networks (CNN) and Vision Transformer (ViT), are frequently employed to perform semantic segmentation of high-resolution remotely sensed images. However, CNNs are constrained by their restricted receptive fields, while ViTs face challenges due to their quadratic complexity. Recently, the Mamba model, featuring linear complexity and a global receptive field, has gained extensive attention for vision tasks. In such tasks, images need to be serialized to form sequences compatible with the Mamba model. Numerous research efforts have explored scanning strategies to serialize images, aiming to enhance the Mamba model's understanding of images. However, the effectiveness of these scanning strategies remains uncertain. In this research, we conduct a comprehensive experimental investigation on the impact of mainstream scanning directions and their combinations on semantic segmentation of remotely sensed images. Through extensive experiments on the LoveDA, ISPRS Potsdam, and ISPRS Vaihingen datasets, we demonstrate that no single scanning strategy outperforms others, regardless of their complexity or the number of scanning directions involved. A simple, single scanning direction is deemed sufficient for semantic segmentation of high-resolution remotely sensed images. Relevant directions for future research are also recommended.

Related papers

Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques [91.26187560114381]
Feature matching is a cornerstone task in computer vision, essential for applications such as image retrieval, stereo matching, 3D reconstruction, and SLAM.<n>This survey comprehensively reviews modality-based feature matching, exploring traditional handcrafted methods and contemporary deep learning approaches.
arXiv Detail & Related papers (2025-07-30T15:56:36Z)
EAMamba: Efficient All-Around Vision State Space Model for Image Restoration [11.190025966582041]
This study introduces Efficient All-Around Mamba (EAMamba), an enhanced framework that incorporates a Multi-Head Selective Scan Module (MHSSM) with an all-around scanning mechanism.<n>EAMamba achieves a significant 31-89% reduction in FLOPs while maintaining favorable performance compared to existing low-level Vision Mamba methods.
arXiv Detail & Related papers (2025-06-27T14:12:58Z)
DefMamba: Deformable Visual State Space Model [65.50381013020248]
We propose a novel visual foundation model called DefMamba. By combining a deformable scanning(DS) strategy, this model significantly improves its ability to learn image structures and detects changes in object details. Numerous experiments have shown that DefMamba achieves state-of-the-art performance in various visual tasks.
arXiv Detail & Related papers (2025-04-08T08:22:54Z)
DAMamba: Vision State Space Model with Dynamic Adaptive Scan [51.81060691414399]
State space models (SSMs) have recently garnered significant attention in computer vision. We propose Dynamic Adaptive Scan (DAS), a data-driven method that adaptively allocates scanning orders and regions. Based on DAS, we propose the vision backbone DAMamba, which significantly outperforms current state-of-the-art vision Mamba models in vision tasks.
arXiv Detail & Related papers (2025-02-18T08:12:47Z)
Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention [59.19580789952102]
This paper proposes a novel semi-supervised Multi-Scale Uncertainty and Cross-Teacher-Student Attention (MUCA) model for RS image semantic segmentation tasks. MUCA constrains the consistency among feature maps at different layers of the network by introducing a multi-scale uncertainty consistency regularization. MUCA utilizes a Cross-Teacher-Student attention mechanism to guide the student network, guiding the student network to construct more discriminative feature representations.
arXiv Detail & Related papers (2025-01-18T11:57:20Z)
MambaCSR: Dual-Interleaved Scanning for Compressed Image Super-Resolution With SSMs [14.42424591513825]
MambaCSR is a framework based on Mamba for the challenging compressed image super-resolution (CSR) task. We propose an efficient dual-interleaved scanning paradigm (DIS) for CSR, which is composed of two scanning strategies. Results on multiple benchmarks have shown the great performance of our MambaCSR in the compressed image super-resolution task.
arXiv Detail & Related papers (2024-08-21T16:30:45Z)
Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration. We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z)
RS-Mamba for Large Remote Sensing Image Dense Prediction [58.12667617617306]
We propose the Remote Sensing Mamba (RSM) for dense prediction tasks in large VHR remote sensing images. RSM is specifically designed to capture the global context of remote sensing images with linear complexity. Our model achieves better efficiency and accuracy than transformer-based models on large remote sensing images.
arXiv Detail & Related papers (2024-04-03T12:06:01Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
Improving Vision Anomaly Detection with the Guidance of Language Modality [64.53005837237754]
This paper tackles the challenges for vision modality from a multimodal point of view. We propose Cross-modal Guidance (CMG) to tackle the redundant information issue and sparse space issue. To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality.
arXiv Detail & Related papers (2023-10-04T13:44:56Z)
An Empirical Study of Remote Sensing Pretraining [117.90699699469639]
We conduct an empirical study of remote sensing pretraining (RSP) on aerial images. RSP can help deliver distinctive performances in scene recognition tasks. RSP mitigates the data discrepancies of traditional ImageNet pretraining on RS images, but it may still suffer from task discrepancies.
arXiv Detail & Related papers (2022-04-06T13:38:11Z)
Homography augumented momentum constrastive learning for SAR image retrieval [3.9743795764085545]
We propose a deep learning-based image retrieval approach using homography transformation augmented contrastive learning. We also propose a training method for the DNNs induced by contrastive learning that does not require any labeling procedure.
arXiv Detail & Related papers (2021-09-21T17:27:07Z)
Advances in Deep Learning for Hyperspectral Image Analysis--Addressing Challenges Arising in Practical Imaging Scenarios [7.41157183358269]
We will review advances in the community that leverage deep learning for robust hyperspectral image analysis. challenges include limited ground truth and high dimensional nature of the data. Specifically, we will review unsupervised, semi-supervised and active learning approaches to image analysis.
arXiv Detail & Related papers (2020-07-16T19:51:02Z)
Gradient-Induced Co-Saliency Detection [81.54194063218216]
Co-saliency detection (Co-SOD) aims to segment the common salient foreground in a group of relevant images. In this paper, inspired by human behavior, we propose a gradient-induced co-saliency detection method.
arXiv Detail & Related papers (2020-04-28T08:40:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.