Related papers: DM-FNet: Unified multimodal medical image fusion via diffusion process-trained encoder-decoder

DM-FNet: Unified multimodal medical image fusion via diffusion process-trained encoder-decoder

URL: http://arxiv.org/abs/2506.15218v1
Date: Wed, 18 Jun 2025 07:55:06 GMT
Title: DM-FNet: Unified multimodal medical image fusion via diffusion process-trained encoder-decoder
Authors: Dan He, Weisheng Li, Guofen Wang, Yuping Huang, Shiqiang Liu,
Abstract summary: Multimodal medical image fusion (MMIF) extracts the most meaningful information from multiple source images.<n>Existing MMIF methods have limited capacity to capture detailed features during conventional training.<n>This study proposes a two-stage diffusion model-based fusion network (DM-FNet) to achieve unified MMIF.
Score: 13.87371547830489
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal medical image fusion (MMIF) extracts the most meaningful information from multiple source images, enabling a more comprehensive and accurate diagnosis. Achieving high-quality fusion results requires a careful balance of brightness, color, contrast, and detail; this ensures that the fused images effectively display relevant anatomical structures and reflect the functional status of the tissues. However, existing MMIF methods have limited capacity to capture detailed features during conventional training and suffer from insufficient cross-modal feature interaction, leading to suboptimal fused image quality. To address these issues, this study proposes a two-stage diffusion model-based fusion network (DM-FNet) to achieve unified MMIF. In Stage I, a diffusion process trains UNet for image reconstruction. UNet captures detailed information through progressive denoising and represents multilevel data, providing a rich set of feature representations for the subsequent fusion network. In Stage II, noisy images at various steps are input into the fusion network to enhance the model's feature recognition capability. Three key fusion modules are also integrated to process medical images from different modalities adaptively. Ultimately, the robust network structure and a hybrid loss function are integrated to harmonize the fused image's brightness, color, contrast, and detail, enhancing its quality and information density. The experimental results across various medical image types demonstrate that the proposed method performs exceptionally well regarding objective evaluation metrics. The fused image preserves appropriate brightness, a comprehensive distribution of radioactive tracers, rich textures, and clear edges. The code is available at https://github.com/HeDan-11/DM-FNet.

Related papers

DFVO: Learning Darkness-free Visible and Infrared Image Disentanglement and Fusion All at Once [57.15043822199561]
A Darkness-Free network is proposed to handle Visible and infrared image disentanglement and fusion all at Once (DFVO)<n>DFVO employs a cascaded multi-task approach to replace the traditional two-stage cascaded training (enhancement and fusion)<n>Our proposed approach outperforms state-of-the-art alternatives in terms of qualitative and quantitative evaluations.
arXiv Detail & Related papers (2025-05-07T15:59:45Z)
Simultaneous Tri-Modal Medical Image Fusion and Super-Resolution using Conditional Diffusion Model [2.507050016527729]
Tri-modal medical image fusion can provide a more comprehensive view of the disease's shape, location, and biological activity. Due to the limitations of imaging equipment and considerations for patient safety, the quality of medical images is usually limited. There is an urgent need for a technology that can both enhance image resolution and integrate multi-modal information.
arXiv Detail & Related papers (2024-04-26T12:13:41Z)
Multi-modal Medical Neurological Image Fusion using Wavelet Pooled Edge Preserving Autoencoder [3.3828292731430545]
This paper presents an end-to-end unsupervised fusion model for multimodal medical images based on an edge-preserving dense autoencoder network. In the proposed model, feature extraction is improved by using wavelet decomposition-based attention pooling of feature maps. The proposed model is trained on a variety of medical image pairs which helps in capturing the intensity distributions of the source images.
arXiv Detail & Related papers (2023-10-18T11:59:35Z)
Three-Dimensional Medical Image Fusion with Deformable Cross-Attention [10.26573411162757]
Multimodal medical image fusion plays an instrumental role in several areas of medical image processing. Traditional fusion methods tend to process each modality independently before combining the features and reconstructing the fusion image. In this study, we introduce an innovative unsupervised feature mutual learning fusion network designed to rectify these limitations.
arXiv Detail & Related papers (2023-10-10T04:10:56Z)
AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential Cross Attention [6.910879180358217]
We propose AdaFuse, in which multimodal image information is fused adaptively through frequency-guided attention mechanism. The proposed method outperforms state-of-the-art methods in terms of both visual quality and quantitative metrics.
arXiv Detail & Related papers (2023-10-09T07:10:30Z)
Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning. Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations. Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z)
DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion [144.9653045465908]
We propose a novel fusion algorithm based on the denoising diffusion probabilistic model (DDPM) Our approach yields promising fusion results in infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2023-03-13T04:06:42Z)
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z)
TFormer: A throughout fusion transformer for multi-modal skin lesion diagnosis [6.899641625551976]
We introduce a pure transformer-based method, which we refer to as Throughout Fusion Transformer (TFormer)", for sufficient information intergration in MSLD. We then carefully design a stack of dual-branch hierarchical multi-modal transformer (HMT) blocks to fuse information across different image modalities in a stage-by-stage way. Our TFormer outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2022-11-21T12:07:05Z)
CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion [68.78897015832113]
We propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion.<n>Our method achieves state-of-the-art (SOTA) performance under both subjective and objective evaluation.
arXiv Detail & Related papers (2022-11-20T12:02:07Z)
Hi-Net: Hybrid-fusion Network for Multi-modal MR Image Synthesis [143.55901940771568]
We propose a novel Hybrid-fusion Network (Hi-Net) for multi-modal MR image synthesis. In our Hi-Net, a modality-specific network is utilized to learn representations for each individual modality. A multi-modal synthesis network is designed to densely combine the latent representation with hierarchical features from each modality.
arXiv Detail & Related papers (2020-02-11T08:26:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.