Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion
- URL: http://arxiv.org/abs/2602.04405v1
- Date: Wed, 04 Feb 2026 10:35:55 GMT
- Title: Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion
- Authors: Yixin Zhu, Long Lv, Pingping Zhang, Xuehu Liu, Tongdan Tang, Feng Tian, Weibing Sun, Huchuan Lu,
- Abstract summary: Multi-Modal Image Fusion (MMIF) aims to combine images from different modalities to produce fused images.<n>We propose a novel Interactive Spatial-Frequency Fusion Mamba framework for MMIF.<n>Our ISFM can achieve better performances than other state-of-the-art methods.
- Score: 69.13852939945433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-Modal Image Fusion (MMIF) aims to combine images from different modalities to produce fused images, retaining texture details and preserving significant information. Recently, some MMIF methods incorporate frequency domain information to enhance spatial features. However, these methods typically rely on simple serial or parallel spatial-frequency fusion without interaction. In this paper, we propose a novel Interactive Spatial-Frequency Fusion Mamba (ISFM) framework for MMIF. Specifically, we begin with a Modality-Specific Extractor (MSE) to extract features from different modalities. It models long-range dependencies across the image with linear computational complexity. To effectively leverage frequency information, we then propose a Multi-scale Frequency Fusion (MFF). It adaptively integrates low-frequency and high-frequency components across multiple scales, enabling robust representations of frequency features. More importantly, we further propose an Interactive Spatial-Frequency Fusion (ISF). It incorporates frequency features to guide spatial features across modalities, enhancing complementary representations. Extensive experiments are conducted on six MMIF datasets. The experimental results demonstrate that our ISFM can achieve better performances than other state-of-the-art methods. The source code is available at https://github.com/Namn23/ISFM.
Related papers
- Spatial-Frequency Enhanced Mamba for Multi-Modal Image Fusion [64.5037956060757]
Multi-Modal Image Fusion (MMIF) aims to integrate complementary image information from different modalities to produce informative images.<n>We propose a novel framework named Spatial-Frequency Enhanced Mamba Fusion (SFMFusion) for MMIF.<n>Our method achieves better results than most state-of-the-art methods on six MMIF datasets.
arXiv Detail & Related papers (2025-11-10T00:44:49Z) - Task-Generalized Adaptive Cross-Domain Learning for Multimodal Image Fusion [15.666336202108862]
Multimodal Image Fusion (MMIF) aims to integrate complementary information from different imaging modalities to overcome the limitations of individual sensors.<n>Current MMIF methods face challenges such as modality misalignment, high-frequency detail destruction, and task-specific limitations.<n>We propose AdaSFFuse, a novel framework for task-generalized MMIF through adaptive cross-domain co-fusion learning.
arXiv Detail & Related papers (2025-08-21T12:31:14Z) - WIFE-Fusion:Wavelet-aware Intra-inter Frequency Enhancement for Multi-model Image Fusion [8.098063209250684]
Multimodal image fusion effectively aggregates information from diverse modalities.<n>Existing methods often neglect frequency-domain feature exploration and interactive relationships.<n>We propose wavelet-aware Intra-inter Frequency Enhancement Fusion (WIFE-Fusion), a multimodal image fusion framework based on frequency-domain components interactions.
arXiv Detail & Related papers (2025-06-04T04:18:32Z) - SFDFusion: An Efficient Spatial-Frequency Domain Fusion Network for Infrared and Visible Image Fusion [11.46957526079837]
Infrared and visible image fusion aims to generate fused images with prominent targets and rich texture details.
This paper proposes an efficient Spatial-Frequency Domain Fusion network for infrared and visible image fusion.
Our method produces fused images with significant advantages in various fusion metrics and visual effects.
arXiv Detail & Related papers (2024-10-30T09:17:23Z) - A Dual Domain Multi-exposure Image Fusion Network based on the
Spatial-Frequency Integration [57.14745782076976]
Multi-exposure image fusion aims to generate a single high-dynamic image by integrating images with different exposures.
We propose a novelty perspective on multi-exposure image fusion via the Spatial-Frequency Integration Framework, named MEF-SFI.
Our method achieves visual-appealing fusion results against state-of-the-art multi-exposure image fusion approaches.
arXiv Detail & Related papers (2023-12-17T04:45:15Z) - AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential
Cross Attention [6.910879180358217]
We propose AdaFuse, in which multimodal image information is fused adaptively through frequency-guided attention mechanism.
The proposed method outperforms state-of-the-art methods in terms of both visual quality and quantitative metrics.
arXiv Detail & Related papers (2023-10-09T07:10:30Z) - Mutual-Guided Dynamic Network for Image Fusion [51.615598671899335]
We propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs.
Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks.
arXiv Detail & Related papers (2023-08-24T03:50:37Z) - Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning.
Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations.
Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.