Task-Generalized Adaptive Cross-Domain Learning for Multimodal Image Fusion
- URL: http://arxiv.org/abs/2508.15505v1
- Date: Thu, 21 Aug 2025 12:31:14 GMT
- Title: Task-Generalized Adaptive Cross-Domain Learning for Multimodal Image Fusion
- Authors: Mengyu Wang, Zhenyu Liu, Kun Li, Yu Wang, Yuwei Wang, Yanyan Wei, Fei Wang,
- Abstract summary: Multimodal Image Fusion (MMIF) aims to integrate complementary information from different imaging modalities to overcome the limitations of individual sensors.<n>Current MMIF methods face challenges such as modality misalignment, high-frequency detail destruction, and task-specific limitations.<n>We propose AdaSFFuse, a novel framework for task-generalized MMIF through adaptive cross-domain co-fusion learning.
- Score: 15.666336202108862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal Image Fusion (MMIF) aims to integrate complementary information from different imaging modalities to overcome the limitations of individual sensors. It enhances image quality and facilitates downstream applications such as remote sensing, medical diagnostics, and robotics. Despite significant advancements, current MMIF methods still face challenges such as modality misalignment, high-frequency detail destruction, and task-specific limitations. To address these challenges, we propose AdaSFFuse, a novel framework for task-generalized MMIF through adaptive cross-domain co-fusion learning. AdaSFFuse introduces two key innovations: the Adaptive Approximate Wavelet Transform (AdaWAT) for frequency decoupling, and the Spatial-Frequency Mamba Blocks for efficient multimodal fusion. AdaWAT adaptively separates the high- and low-frequency components of multimodal images from different scenes, enabling fine-grained extraction and alignment of distinct frequency characteristics for each modality. The Spatial-Frequency Mamba Blocks facilitate cross-domain fusion in both spatial and frequency domains, enhancing this process. These blocks dynamically adjust through learnable mappings to ensure robust fusion across diverse modalities. By combining these components, AdaSFFuse improves the alignment and integration of multimodal features, reduces frequency loss, and preserves critical details. Extensive experiments on four MMIF tasks -- Infrared-Visible Image Fusion (IVF), Multi-Focus Image Fusion (MFF), Multi-Exposure Image Fusion (MEF), and Medical Image Fusion (MIF) -- demonstrate AdaSFFuse's superior fusion performance, ensuring both low computational cost and a compact network, offering a strong balance between performance and efficiency. The code will be publicly available at https://github.com/Zhen-yu-Liu/AdaSFFuse.
Related papers
- Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion [69.13852939945433]
Multi-Modal Image Fusion (MMIF) aims to combine images from different modalities to produce fused images.<n>We propose a novel Interactive Spatial-Frequency Fusion Mamba framework for MMIF.<n>Our ISFM can achieve better performances than other state-of-the-art methods.
arXiv Detail & Related papers (2026-02-04T10:35:55Z) - MMMamba: A Versatile Cross-Modal In Context Fusion Framework for Pan-Sharpening and Zero-Shot Image Enhancement [29.94979992704961]
Pan-sharpening aims to generate high-resolution multispectral (HRMS) images by integrating a high-resolution panchromatic (PAN) image with its corresponding low-resolution multispectral (MS) image.<n>Traditional CNN-based methods rely on channel-wise concatenation with fixed convolutional operators.<n>We propose MMMamba, a cross-modal in-context fusion framework for pan-sharpening.
arXiv Detail & Related papers (2025-12-17T10:07:09Z) - Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach [99.80480649258557]
DiTFuse is an instruction-driven framework that performs semantics-aware fusion within a single model.<n>Experiments on public IVIF, MFF, and MEF benchmarks confirm superior quantitative and qualitative performance, sharper textures, and better semantic retention.
arXiv Detail & Related papers (2025-12-08T05:04:54Z) - Spatial-Frequency Enhanced Mamba for Multi-Modal Image Fusion [64.5037956060757]
Multi-Modal Image Fusion (MMIF) aims to integrate complementary image information from different modalities to produce informative images.<n>We propose a novel framework named Spatial-Frequency Enhanced Mamba Fusion (SFMFusion) for MMIF.<n>Our method achieves better results than most state-of-the-art methods on six MMIF datasets.
arXiv Detail & Related papers (2025-11-10T00:44:49Z) - WIFE-Fusion:Wavelet-aware Intra-inter Frequency Enhancement for Multi-model Image Fusion [8.098063209250684]
Multimodal image fusion effectively aggregates information from diverse modalities.<n>Existing methods often neglect frequency-domain feature exploration and interactive relationships.<n>We propose wavelet-aware Intra-inter Frequency Enhancement Fusion (WIFE-Fusion), a multimodal image fusion framework based on frequency-domain components interactions.
arXiv Detail & Related papers (2025-06-04T04:18:32Z) - Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images.
DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z) - A Dual Domain Multi-exposure Image Fusion Network based on the
Spatial-Frequency Integration [57.14745782076976]
Multi-exposure image fusion aims to generate a single high-dynamic image by integrating images with different exposures.
We propose a novelty perspective on multi-exposure image fusion via the Spatial-Frequency Integration Framework, named MEF-SFI.
Our method achieves visual-appealing fusion results against state-of-the-art multi-exposure image fusion approaches.
arXiv Detail & Related papers (2023-12-17T04:45:15Z) - AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential
Cross Attention [6.910879180358217]
We propose AdaFuse, in which multimodal image information is fused adaptively through frequency-guided attention mechanism.
The proposed method outperforms state-of-the-art methods in terms of both visual quality and quantitative metrics.
arXiv Detail & Related papers (2023-10-09T07:10:30Z) - Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning.
Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations.
Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.