Related papers: SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion

SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion

URL: http://arxiv.org/abs/2508.05264v1
Date: Thu, 07 Aug 2025 10:58:52 GMT
Title: SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion
Authors: Xiaoyang Zhang, Zhen Hua, Yakun Ju, Wei Zhou, Jun Liu, Alex C. Kot,
Abstract summary: This paper proposes a conditional diffusion model guided by the Segment Anything Model (SAM) to achieve high-fidelity and semantically-aware image fusion.<n>The framework operates in a two-stage process: it first performs a preliminary fusion of multi-modal features, and then utilizes the semantic masks as a condition to drive the diffusion model's coarse-to-fine denoising generation.<n>Extensive experiments demonstrate that SGDFuse achieves state-of-the-art performance in both subjective and objective evaluations.
Score: 38.09521879556221
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Infrared and visible image fusion (IVIF) aims to combine the thermal radiation information from infrared images with the rich texture details from visible images to enhance perceptual capabilities for downstream visual tasks. However, existing methods often fail to preserve key targets due to a lack of deep semantic understanding of the scene, while the fusion process itself can also introduce artifacts and detail loss, severely compromising both image quality and task performance. To address these issues, this paper proposes SGDFuse, a conditional diffusion model guided by the Segment Anything Model (SAM), to achieve high-fidelity and semantically-aware image fusion. The core of our method is to utilize high-quality semantic masks generated by SAM as explicit priors to guide the optimization of the fusion process via a conditional diffusion model. Specifically, the framework operates in a two-stage process: it first performs a preliminary fusion of multi-modal features, and then utilizes the semantic masks from SAM jointly with the preliminary fused image as a condition to drive the diffusion model's coarse-to-fine denoising generation. This ensures the fusion process not only has explicit semantic directionality but also guarantees the high fidelity of the final result. Extensive experiments demonstrate that SGDFuse achieves state-of-the-art performance in both subjective and objective evaluations, as well as in its adaptability to downstream tasks, providing a powerful solution to the core challenges in image fusion. The code of SGDFuse is available at https://github.com/boshizhang123/SGDFuse.

Related papers

DFVO: Learning Darkness-free Visible and Infrared Image Disentanglement and Fusion All at Once [57.15043822199561]
A Darkness-Free network is proposed to handle Visible and infrared image disentanglement and fusion all at Once (DFVO)<n>DFVO employs a cascaded multi-task approach to replace the traditional two-stage cascaded training (enhancement and fusion)<n>Our proposed approach outperforms state-of-the-art alternatives in terms of qualitative and quantitative evaluations.
arXiv Detail & Related papers (2025-05-07T15:59:45Z)
OCCO: LVM-guided Infrared and Visible Image Fusion Framework based on Object-aware and Contextual COntrastive Learning [19.22887628187884]
A novel LVM-guided fusion framework with Object-aware and Contextual COntrastive learning is proposed.<n>A novel feature interaction fusion network is also designed to resolve information conflicts in fusion images caused by modality differences.<n>The effectiveness of the proposed method is validated, and exceptional performance is also demonstrated on downstream visual task.
arXiv Detail & Related papers (2025-03-24T12:57:23Z)
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond [52.486290612938895]
We propose a novel method that leverages the semantic knowledge from the Segment Anything Model (SAM) to Grow the quality of fusion results and Enable downstream task adaptability.<n> Specifically, we design a Semantic Persistent Attention (SPA) Module that efficiently maintains source information via the persistent repository while extracting high-level semantic priors from SAM.<n>Our method achieves a balance between high-quality visual results and downstream task adaptability while maintaining practical deployment efficiency.
arXiv Detail & Related papers (2025-03-03T06:16:31Z)
FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution [48.88184541515326]
We propose a simple and effective method, named FaithDiff, to fully harness the power of latent diffusion models (LDMs) for faithful image SR.<n>In contrast to existing diffusion-based SR methods that freeze the diffusion model pre-trained on high-quality images, we propose to unleash the diffusion prior to identify useful information and recover faithful structures.
arXiv Detail & Related papers (2024-11-27T23:58:03Z)
Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images. DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z)
SSPFusion: A Semantic Structure-Preserving Approach for Infrared and Visible Image Fusion [30.55433673796615]
Most existing learning-based infrared and visible image fusion (IVIF) methods exhibit massive redundant information in the fusion images. We propose a semantic structure-preserving approach for IVIF, namely SSPFusion. Our method is able to generate high-quality fusion images from pairs of infrared and visible images, which can boost the performance of downstream computer-vision tasks.
arXiv Detail & Related papers (2023-09-26T08:13:32Z)
DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion [144.9653045465908]
We propose a novel fusion algorithm based on the denoising diffusion probabilistic model (DDPM) Our approach yields promising fusion results in infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2023-03-13T04:06:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.