Pan-Mamba: Effective pan-sharpening with State Space Model
- URL: http://arxiv.org/abs/2402.12192v2
- Date: Sat, 9 Mar 2024 03:16:25 GMT
- Title: Pan-Mamba: Effective pan-sharpening with State Space Model
- Authors: Xuanhua He, Ke Cao, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man
Zhou
- Abstract summary: Pan-Mamba represents a novel pan-sharpening network that leverages the efficiency of the Mamba model in global information modeling.
Our proposed approach surpasses state-of-the-art methods, showcasing superior fusion results in pan-sharpening.
This work is the first attempt in exploring the potential of the Mamba model and establishes a new frontier in the pan-sharpening techniques.
- Score: 21.032910745931936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pan-sharpening involves integrating information from low-resolution
multi-spectral and high-resolution panchromatic images to generate
high-resolution multi-spectral counterparts. While recent advancements in the
state space model, particularly the efficient long-range dependency modeling
achieved by Mamba, have revolutionized computer vision community, its untapped
potential in pan-sharpening motivates our exploration. Our contribution,
Pan-Mamba, represents a novel pan-sharpening network that leverages the
efficiency of the Mamba model in global information modeling. In Pan-Mamba, we
customize two core components: channel swapping Mamba and cross-modal Mamba,
strategically designed for efficient cross-modal information exchange and
fusion. The former initiates a lightweight cross-modal interaction through the
exchange of partial panchromatic and multi-spectral channels, while the latter
facilities the information representation capability by exploiting inherent
cross-modal relationships. Through extensive experiments across diverse
datasets, our proposed approach surpasses state-of-the-art methods, showcasing
superior fusion results in pan-sharpening. To the best of our knowledge, this
work is the first attempt in exploring the potential of the Mamba model and
establishes a new frontier in the pan-sharpening techniques. The source code is
available at \url{https://github.com/alexhe101/Pan-Mamba}.
Related papers
- Shuffle Mamba: State Space Models with Random Shuffle for Multi-Modal Image Fusion [28.543822934210404]
Multi-modal image fusion integrates complementary information from different modalities to produce enhanced and informative images.
We propose a novel Bayesian-inspired scanning strategy called Random Shuffle to eliminate biases associated with fixed sequence scanning.
We develop a testing methodology based on Monte-Carlo averaging to ensure the model's output aligns more closely with expected results.
arXiv Detail & Related papers (2024-09-03T09:12:18Z) - StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation [63.31007867379312]
We propose StitchFusion, a framework that integrates large-scale pre-trained models directly as encoders and feature fusers.
We introduce a multi-directional adapter module (MultiAdapter) to enable cross-modal information transfer during encoding.
Our model achieves state-of-the-art performance on four multi-modal segmentation datasets with minimal additional parameters.
arXiv Detail & Related papers (2024-08-02T15:41:16Z) - MambaVision: A Hybrid Mamba-Transformer Vision Backbone [54.965143338206644]
We propose a novel hybrid Mamba-Transformer backbone, denoted as MambaVision, which is specifically tailored for vision applications.
Our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features.
We conduct a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba.
arXiv Detail & Related papers (2024-07-10T23:02:45Z) - Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba [77.21394300708172]
Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond.
The recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential.
This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering: (i) the functioning of the Mamba mechanism and its foundation on the principles of structured state space models; (ii) the proposed improvements and the integration of Mamba with various networks, exploring its potential as a substitute for Transformers; (iii) the combination of
arXiv Detail & Related papers (2024-06-24T15:27:21Z) - Visual Mamba: A Survey and New Outlooks [33.90213491829634]
Mamba, a recent selective structured state space model, excels in long sequence modeling.
Since January 2024, Mamba has been actively applied to diverse computer vision tasks.
This paper reviews visual Mamba approaches, analyzing over 200 papers.
arXiv Detail & Related papers (2024-04-29T16:51:30Z) - FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba [17.75933946414591]
Multi-modal image fusion aims to combine information from different modes to create a single image with detailed textures.
Transformer-based models, while excelling in global feature modeling, confront computational challenges stemming from their quadratic complexity.
We propose FusionMamba, a novel dynamic feature enhancement method for multimodal image fusion with Mamba.
arXiv Detail & Related papers (2024-04-15T06:37:21Z) - A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion [14.293042131263924]
In image fusion tasks, images from different sources possess distinct characteristics.
Mamba, as a state space model, has emerged in the field of natural language processing.
Motivated by these challenges, we customize and improve the vision Mamba network designed for the image fusion task.
arXiv Detail & Related papers (2024-04-14T16:09:33Z) - Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance.
We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction.
Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z) - FusionMamba: Efficient Image Fusion with State Space Model [35.57157248152558]
Image fusion aims to generate a high-resolution multi/hyper-spectral image with limited spectral information and a low-resolution image with abundant spectral data.
Current deep learning (DL)-based methods for image fusion rely on CNNs or Transformers to extract features and merge different types of data.
We propose FusionMamba, an innovative method for efficient image fusion.
arXiv Detail & Related papers (2024-04-11T17:29:56Z) - ReMamber: Referring Image Segmentation with Mamba Twister [51.291487576255435]
ReMamber is a novel RIS architecture that integrates the power of Mamba with a multi-modal Mamba Twister block.
The Mamba Twister explicitly models image-text interaction, and fuses textual and visual features through its unique channel and spatial twisting mechanism.
arXiv Detail & Related papers (2024-03-26T16:27:37Z) - Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining [85.08169822181685]
This paper introduces a novel Mamba-based model, Swin-UMamba, designed specifically for medical image segmentation tasks.
Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models.
arXiv Detail & Related papers (2024-02-05T18:58:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.