Related papers: FusionFM: All-in-One Multi-Modal Image Fusion with Flow Matching

FusionFM: All-in-One Multi-Modal Image Fusion with Flow Matching

URL: http://arxiv.org/abs/2511.13794v1
Date: Mon, 17 Nov 2025 02:56:48 GMT
Title: FusionFM: All-in-One Multi-Modal Image Fusion with Flow Matching
Authors: Huayi Zhu, Xiu Shu, Youqiang Xiong, Qiao Liu, Rui Chen, Di Yuan, Xiaojun Chang, Zhenyu He,
Abstract summary: We formulate image fusion as a direct probabilistic transport from source modalities to the fused image distribution.<n>We employ a task-aware selection function to select the most reliable pseudo-labels for each task.<n>For multi-task scenarios, we integrate elastic weight consolidation and experience replay mechanisms to preserve cross-task performance.
Score: 42.22268167379098
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current multi-modal image fusion methods typically rely on task-specific models, leading to high training costs and limited scalability. While generative methods provide a unified modeling perspective, they often suffer from slow inference due to the complex sampling trajectories from noise to image. To address this, we formulate image fusion as a direct probabilistic transport from source modalities to the fused image distribution, leveraging the flow matching paradigm to improve sampling efficiency and structural consistency. To mitigate the lack of high-quality fused images for supervision, we collect fusion results from multiple state-of-the-art models as priors, and employ a task-aware selection function to select the most reliable pseudo-labels for each task. We further introduce a Fusion Refiner module that employs a divide-and-conquer strategy to systematically identify, decompose, and enhance degraded components in selected pseudo-labels. For multi-task scenarios, we integrate elastic weight consolidation and experience replay mechanisms to preserve cross-task performance and enhance continual learning ability from both parameter stability and memory retention perspectives. Our approach achieves competitive performance across diverse fusion tasks, while significantly improving sampling efficiency and maintaining a lightweight model design. The code will be available at: https://github.com/Ist-Zhy/FusionFM.

Related papers

CtrlFuse: Mask-Prompt Guided Controllable Infrared and Visible Image Fusion [51.060328159429154]
Infrared and visible image fusion generates all-weather perception-capable images by combining complementary modalities.<n>We propose CtrlFuse, a controllable image fusion framework that enables interactive dynamic fusion guided by mask prompts.<n> Experiments demonstrate state-of-the-art results in both fusion controllability and segmentation accuracy, with the adapted task branch even outperforming the original segmentation model.
arXiv Detail & Related papers (2026-01-12T13:36:48Z)
Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach [99.80480649258557]
DiTFuse is an instruction-driven framework that performs semantics-aware fusion within a single model.<n>Experiments on public IVIF, MFF, and MEF benchmarks confirm superior quantitative and qualitative performance, sharper textures, and better semantic retention.
arXiv Detail & Related papers (2025-12-08T05:04:54Z)
Efficient Rectified Flow for Image Fusion [48.330480065862474]
We propose RFfusion, an efficient one-step diffusion model for image fusion based on Rectified Flow.<n>We also propose a task-specific variational autoencoder architecture tailored for image fusion.<n>Our method outperforms other state-of-the-art methods in terms of both inference speed and fusion quality.
arXiv Detail & Related papers (2025-09-20T06:21:00Z)
Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images. DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z)
Shuffle Mamba: State Space Models with Random Shuffle for Multi-Modal Image Fusion [28.543822934210404]
Multi-modal image fusion integrates complementary information from different modalities to produce enhanced and informative images. We propose a novel Bayesian-inspired scanning strategy called Random Shuffle to eliminate biases associated with fixed sequence scanning. We develop a testing methodology based on Monte-Carlo averaging to ensure the model's output aligns more closely with expected results.
arXiv Detail & Related papers (2024-09-03T09:12:18Z)
A Task-guided, Implicitly-searched and Meta-initialized Deep Model for Image Fusion [69.10255211811007]
We present a Task-guided, Implicit-searched and Meta- generalizationd (TIM) deep model to address the image fusion problem in a challenging real-world scenario. Specifically, we propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion. Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency.
arXiv Detail & Related papers (2023-05-25T08:54:08Z)
Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning. Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations. Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z)
Progressive Fusion for Multimodal Integration [12.94175198001421]
We present an iterative representation refinement approach, called Progressive Fusion, which mitigates the issues with late fusion representations. We show that our approach consistently improves performance, for instance attaining a 5% reduction in MSE and 40% improvement in robustness on multimodal time series prediction.
arXiv Detail & Related papers (2022-09-01T09:08:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.