Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion
- URL: http://arxiv.org/abs/2409.03223v1
- Date: Thu, 5 Sep 2024 03:42:11 GMT
- Title: Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion
- Authors: Chenguang Zhu, Shan Gao, Huafeng Chen, Guangqian Guo, Chaowei Wang, Yaoxing Wang, Chen Shu Lei, Quanjiang Fan,
- Abstract summary: We propose a dual-branch image fusion network called Tmamba.
It consists of linear Transformer and Mamba, which has global modeling capabilities while maintaining linear complexity.
Experiments show that our Tmamba achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
- Score: 15.79138560700532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modality image fusion aims to integrate the merits of images from different sources and render high-quality fusion images. However, existing feature extraction and fusion methods are either constrained by inherent local reduction bias and static parameters during inference (CNN) or limited by quadratic computational complexity (Transformers), and cannot effectively extract and fuse features. To solve this problem, we propose a dual-branch image fusion network called Tmamba. It consists of linear Transformer and Mamba, which has global modeling capabilities while maintaining linear complexity. Due to the difference between the Transformer and Mamba structures, the features extracted by the two branches carry channel and position information respectively. T-M interaction structure is designed between the two branches, using global learnable parameters and convolutional layers to transfer position and channel information respectively. We further propose cross-modal interaction at the attention level to obtain cross-modal attention. Experiments show that our Tmamba achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion. Code with checkpoints will be available after the peer-review process.
Related papers
- Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images.
DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z) - A Hybrid Transformer-Mamba Network for Single Image Deraining [70.64069487982916]
Existing deraining Transformers employ self-attention mechanisms with fixed-range windows or along channel dimensions.
We introduce a novel dual-branch hybrid Transformer-Mamba network, denoted as TransMamba, aimed at effectively capturing long-range rain-related dependencies.
arXiv Detail & Related papers (2024-08-31T10:03:19Z) - FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba [17.75933946414591]
Multi-modal image fusion aims to combine information from different modes to create a single image with detailed textures.
Transformer-based models, while excelling in global feature modeling, confront computational challenges stemming from their quadratic complexity.
We propose FusionMamba, a novel dynamic feature enhancement method for multimodal image fusion with Mamba.
arXiv Detail & Related papers (2024-04-15T06:37:21Z) - Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance.
We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction.
Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z) - MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion [4.2474907126377115]
Multi-modality image fusion (MMIF) aims to integrate complementary information from different modalities into a single fused image.
We propose a Mamba-based Dual-phase Fusion model (MambaDFuse) to extract modality-specific and modality-fused features.
Our approach achieves promising fusion results in infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2024-04-12T11:33:26Z) - FusionMamba: Efficient Image Fusion with State Space Model [35.57157248152558]
Image fusion aims to generate a high-resolution multi/hyper-spectral image with limited spectral information and a low-resolution image with abundant spectral data.
Current deep learning (DL)-based methods for image fusion rely on CNNs or Transformers to extract features and merge different types of data.
We propose FusionMamba, an innovative method for efficient image fusion.
arXiv Detail & Related papers (2024-04-11T17:29:56Z) - Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning.
Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations.
Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - Multimodal Image Fusion based on Hybrid CNN-Transformer and Non-local
Cross-modal Attention [12.167049432063132]
We present a hybrid model consisting of a convolutional encoder and a Transformer-based decoder to fuse multimodal images.
A branch fusion module is designed to adaptively fuse the features of the two branches.
arXiv Detail & Related papers (2022-10-18T13:30:52Z) - Image Fusion Transformer [75.71025138448287]
In image fusion, images obtained from different sensors are fused to generate a single image with enhanced information.
In recent years, state-of-the-art methods have adopted Convolution Neural Networks (CNNs) to encode meaningful features for image fusion.
We propose a novel Image Fusion Transformer (IFT) where we develop a transformer-based multi-scale fusion strategy.
arXiv Detail & Related papers (2021-07-19T16:42:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.