Related papers: FusionMamba: Efficient Image Fusion with State Space Model

FusionMamba: Efficient Image Fusion with State Space Model

URL: http://arxiv.org/abs/2404.07932v2
Date: Fri, 10 May 2024 20:32:15 GMT
Title: FusionMamba: Efficient Image Fusion with State Space Model
Authors: Siran Peng, Xiangyu Zhu, Haoyu Deng, Zhen Lei, Liang-Jian Deng,
Abstract summary: Image fusion aims to generate a high-resolution multi/hyper-spectral image with limited spectral information and a low-resolution image with abundant spectral data. Current deep learning (DL)-based methods for image fusion rely on CNNs or Transformers to extract features and merge different types of data. We propose FusionMamba, an innovative method for efficient image fusion.
Score: 35.57157248152558
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image fusion aims to generate a high-resolution multi/hyper-spectral image by combining a high-resolution image with limited spectral information and a low-resolution image with abundant spectral data. Current deep learning (DL)-based methods for image fusion primarily rely on CNNs or Transformers to extract features and merge different types of data. While CNNs are efficient, their receptive fields are limited, restricting their capacity to capture global context. Conversely, Transformers excel at learning global information but are hindered by their quadratic complexity. Fortunately, recent advancements in the State Space Model (SSM), particularly Mamba, offer a promising solution to this issue by enabling global awareness with linear complexity. However, there have been few attempts to explore the potential of the SSM in information fusion, which is a crucial ability in domains like image fusion. Therefore, we propose FusionMamba, an innovative method for efficient image fusion. Our contributions mainly focus on two aspects. Firstly, recognizing that images from different sources possess distinct properties, we incorporate Mamba blocks into two U-shaped networks, presenting a novel architecture that extracts spatial and spectral features in an efficient, independent, and hierarchical manner. Secondly, to effectively combine spatial and spectral information, we extend the Mamba block to accommodate dual inputs. This expansion leads to the creation of a new module called the FusionMamba block, which outperforms existing fusion techniques such as concatenation and cross-attention. We conduct a series of experiments on five datasets related to three image fusion tasks. The quantitative and qualitative evaluation results demonstrate that our method achieves SOTA performance, underscoring the superiority of FusionMamba. The code is available at https://github.com/PSRben/FusionMamba.

Related papers

HTMNet: A Hybrid Network with Transformer-Mamba Bottleneck Multimodal Fusion for Transparent and Reflective Objects Depth Completion [9.235004977824026]
Transparent and reflective objects pose significant challenges for depth sensors.<n>We propose HTMNet, a novel hybrid model integrating Transformer, CNN, and Mamba architectures.<n>We introduce a novel multimodal fusion module grounded in self-attention mechanisms and state space models.
arXiv Detail & Related papers (2025-05-27T08:51:38Z)
Unity is Strength: Unifying Convolutional and Transformeral Features for Better Person Re-Identification [60.9670254833103]
Person Re-identification (ReID) aims to retrieve the specific person across non-overlapping cameras. We propose a novel fusion framework called FusionReID to unify the strengths of CNNs and Transformers for image-based person ReID.
arXiv Detail & Related papers (2024-12-23T03:19:19Z)
DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion [21.64382683858586]
Infrared and visible image fusion aims to combine complementary information from both modalities to provide a more comprehensive scene understanding. We propose a dual-branch feature decomposition fusion network (DAF-Net) with Maximum domain adaptive. By incorporating MK-MMD, the DAF-Net effectively aligns the latent feature spaces of visible and infrared images, thereby improving the quality of the fused images.
arXiv Detail & Related papers (2024-09-18T02:14:08Z)
Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion [15.79138560700532]
We propose a dual-branch image fusion network called Tmamba. It consists of linear Transformer and Mamba, which has global modeling capabilities while maintaining linear complexity. Experiments show that our Tmamba achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2024-09-05T03:42:11Z)
A Hybrid Transformer-Mamba Network for Single Image Deraining [70.64069487982916]
Existing deraining Transformers employ self-attention mechanisms with fixed-range windows or along channel dimensions. We introduce a novel dual-branch hybrid Transformer-Mamba network, denoted as TransMamba, aimed at effectively capturing long-range rain-related dependencies.
arXiv Detail & Related papers (2024-08-31T10:03:19Z)
Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement [49.15531684596958]
We propose a Dual-Domain Feature Fusion Network (DFFN) for low-light remote sensing image enhancement. The first phase learns amplitude information to restore image brightness, and the second phase learns phase information to refine details. We have constructed two dark light remote sensing datasets to address the current lack of datasets in dark light remote sensing image enhancement.
arXiv Detail & Related papers (2024-04-26T13:21:31Z)
FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba [17.75933946414591]
Multi-modal image fusion aims to combine information from different modes to create a single image with detailed textures. Transformer-based models, while excelling in global feature modeling, confront computational challenges stemming from their quadratic complexity. We propose FusionMamba, a novel dynamic feature enhancement method for multimodal image fusion with Mamba.
arXiv Detail & Related papers (2024-04-15T06:37:21Z)
A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion [14.293042131263924]
In image fusion tasks, images from different sources possess distinct characteristics. Mamba, as a state space model, has emerged in the field of natural language processing. Motivated by these challenges, we customize and improve the vision Mamba network designed for the image fusion task.
arXiv Detail & Related papers (2024-04-14T16:09:33Z)
Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance. We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction. Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z)
MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion [4.2474907126377115]
Multi-modality image fusion (MMIF) aims to integrate complementary information from different modalities into a single fused image. We propose a Mamba-based Dual-phase Fusion model (MambaDFuse) to extract modality-specific and modality-fused features. Our approach achieves promising fusion results in infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2024-04-12T11:33:26Z)
Mutual-Guided Dynamic Network for Image Fusion [51.615598671899335]
We propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs. Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks.
arXiv Detail & Related papers (2023-08-24T03:50:37Z)
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z)
Image Fusion Transformer [75.71025138448287]
In image fusion, images obtained from different sensors are fused to generate a single image with enhanced information. In recent years, state-of-the-art methods have adopted Convolution Neural Networks (CNNs) to encode meaningful features for image fusion. We propose a novel Image Fusion Transformer (IFT) where we develop a transformer-based multi-scale fusion strategy.
arXiv Detail & Related papers (2021-07-19T16:42:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.