SSPFusion: A Semantic Structure-Preserving Approach for Infrared and
Visible Image Fusion
- URL: http://arxiv.org/abs/2309.14745v2
- Date: Tue, 26 Dec 2023 05:36:09 GMT
- Title: SSPFusion: A Semantic Structure-Preserving Approach for Infrared and
Visible Image Fusion
- Authors: Qiao Yang, Yu Zhang, Jian Zhang, Zijing Zhao, Shunli Zhang, Jinqiao
Wang, Junzhe Chen
- Abstract summary: Most existing learning-based infrared and visible image fusion (IVIF) methods exhibit massive redundant information in the fusion images.
We propose a semantic structure-preserving approach for IVIF, namely SSPFusion.
Our method is able to generate high-quality fusion images from pairs of infrared and visible images, which can boost the performance of downstream computer-vision tasks.
- Score: 30.55433673796615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing learning-based infrared and visible image fusion (IVIF) methods
exhibit massive redundant information in the fusion images, i.e., yielding
edge-blurring effect or unrecognizable for object detectors. To alleviate these
issues, we propose a semantic structure-preserving approach for IVIF, namely
SSPFusion. At first, we design a Structural Feature Extractor (SFE) to extract
the structural features of infrared and visible images. Then, we introduce a
multi-scale Structure-Preserving Fusion (SPF) module to fuse the structural
features of infrared and visible images, while maintaining the consistency of
semantic structures between the fusion and source images. Owing to these two
effective modules, our method is able to generate high-quality fusion images
from pairs of infrared and visible images, which can boost the performance of
downstream computer-vision tasks. Experimental results on three benchmarks
demonstrate that our method outperforms eight state-of-the-art image fusion
methods in terms of both qualitative and quantitative evaluations. The code for
our method, along with additional comparison results, will be made available
at: https://github.com/QiaoYang-CV/SSPFUSION.
Related papers
- CtrlFuse: Mask-Prompt Guided Controllable Infrared and Visible Image Fusion [51.060328159429154]
Infrared and visible image fusion generates all-weather perception-capable images by combining complementary modalities.<n>We propose CtrlFuse, a controllable image fusion framework that enables interactive dynamic fusion guided by mask prompts.<n> Experiments demonstrate state-of-the-art results in both fusion controllability and segmentation accuracy, with the adapted task branch even outperforming the original segmentation model.
arXiv Detail & Related papers (2026-01-12T13:36:48Z) - Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach [99.80480649258557]
DiTFuse is an instruction-driven framework that performs semantics-aware fusion within a single model.<n>Experiments on public IVIF, MFF, and MEF benchmarks confirm superior quantitative and qualitative performance, sharper textures, and better semantic retention.
arXiv Detail & Related papers (2025-12-08T05:04:54Z) - MAFS: Masked Autoencoder for Infrared-Visible Image Fusion and Semantic Segmentation [43.62940654606311]
We propose a unified network for image fusion and semantic segmentation.<n>We devise a heterogeneous feature fusion strategy to enhance semantic-aware capabilities for image fusion.<n>Within the framework, we design a novel multi-stage Transformer decoder to aggregate fine-grained multi-scale fused features efficiently.
arXiv Detail & Related papers (2025-09-15T11:55:55Z) - SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion [38.09521879556221]
This paper proposes a conditional diffusion model guided by the Segment Anything Model (SAM) to achieve high-fidelity and semantically-aware image fusion.<n>The framework operates in a two-stage process: it first performs a preliminary fusion of multi-modal features, and then utilizes the semantic masks as a condition to drive the diffusion model's coarse-to-fine denoising generation.<n>Extensive experiments demonstrate that SGDFuse achieves state-of-the-art performance in both subjective and objective evaluations.
arXiv Detail & Related papers (2025-08-07T10:58:52Z) - Infrared and Visible Image Fusion Based on Implicit Neural Representations [3.8530055385287403]
Infrared and visible light image fusion aims to combine the strengths of both modalities to generate images that are rich in information.<n>This paper proposes an image fusion method based on Implicit Neural Representations (INR), referred to as INRFuse.<n> Experimental results indicate that INRFuse outperforms existing methods in both subjective visual quality and objective evaluation metrics.
arXiv Detail & Related papers (2025-06-20T06:34:19Z) - DFVO: Learning Darkness-free Visible and Infrared Image Disentanglement and Fusion All at Once [57.15043822199561]
A Darkness-Free network is proposed to handle Visible and infrared image disentanglement and fusion all at Once (DFVO)<n>DFVO employs a cascaded multi-task approach to replace the traditional two-stage cascaded training (enhancement and fusion)<n>Our proposed approach outperforms state-of-the-art alternatives in terms of qualitative and quantitative evaluations.
arXiv Detail & Related papers (2025-05-07T15:59:45Z) - Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images.
DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z) - DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion [21.64382683858586]
Infrared and visible image fusion aims to combine complementary information from both modalities to provide a more comprehensive scene understanding.
We propose a dual-branch feature decomposition fusion network (DAF-Net) with Maximum domain adaptive.
By incorporating MK-MMD, the DAF-Net effectively aligns the latent feature spaces of visible and infrared images, thereby improving the quality of the fused images.
arXiv Detail & Related papers (2024-09-18T02:14:08Z) - Guided Image Restoration via Simultaneous Feature and Image Guided
Fusion [67.30078778732998]
We propose a Simultaneous Feature and Image Guided Fusion (SFIGF) network.
It considers feature and image-level guided fusion following the guided filter (GF) mechanism.
Since guided fusion is implemented in both feature and image domains, the proposed SFIGF is expected to faithfully reconstruct both contextual and textual information.
arXiv Detail & Related papers (2023-12-14T12:15:45Z) - IAIFNet: An Illumination-Aware Infrared and Visible Image Fusion Network [13.11361803763253]
We propose an Illumination-Aware Infrared and Visible Image Fusion Network, named as IAIFNet.
In our framework, an illumination enhancement network first estimates the incident illumination maps of input images.
With the help of proposed adaptive differential fusion module (ADFM) and salient target aware module (STAM), an image fusion network effectively integrates the salient features of the illumination-enhanced infrared and visible images into a fusion image of high visual quality.
arXiv Detail & Related papers (2023-09-26T15:12:29Z) - Fusion of Infrared and Visible Images based on Spatial-Channel
Attentional Mechanism [3.388001684915793]
We present AMFusionNet, an innovative approach to infrared and visible image fusion (IVIF)
By assimilating thermal details from infrared images with texture features from visible sources, our method produces images enriched with comprehensive information.
Our method outperforms state-of-the-art algorithms in terms of quality and quantity.
arXiv Detail & Related papers (2023-08-25T21:05:11Z) - PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant
Semantic Segmentation [50.556961575275345]
We propose a perception-aware fusion framework to promote segmentation robustness in adversarial scenes.
We show that our scheme substantially enhances the robustness, with gains of 15.3% mIOU, compared with advanced competitors.
arXiv Detail & Related papers (2023-08-08T01:55:44Z) - Searching a Compact Architecture for Robust Multi-Exposure Image Fusion [55.37210629454589]
Two major stumbling blocks hinder the development, including pixel misalignment and inefficient inference.
This study introduces an architecture search-based paradigm incorporating self-alignment and detail repletion modules for robust multi-exposure image fusion.
The proposed method outperforms various competitive schemes, achieving a noteworthy 3.19% improvement in PSNR for general scenarios and an impressive 23.5% enhancement in misaligned scenarios.
arXiv Detail & Related papers (2023-05-20T17:01:52Z) - Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning.
Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations.
Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z) - An Interactively Reinforced Paradigm for Joint Infrared-Visible Image
Fusion and Saliency Object Detection [59.02821429555375]
This research focuses on the discovery and localization of hidden objects in the wild and serves unmanned systems.
Through empirical analysis, infrared and visible image fusion (IVIF) enables hard-to-find objects apparent.
multimodal salient object detection (SOD) accurately delineates the precise spatial location of objects within the picture.
arXiv Detail & Related papers (2023-05-17T06:48:35Z) - CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature
Ensemble for Multi-modality Image Fusion [72.8898811120795]
We propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion.
Our method achieves state-of-the-art (SOTA) performance under both subjective and objective evaluation.
arXiv Detail & Related papers (2022-11-20T12:02:07Z) - Multi-feature Co-learning for Image Inpainting [2.4571440831539824]
In this paper, we design a deep multi-feature co-learning network for image inpainting.
To be specific, we first use two branches to learn structure features and texture features separately.
The proposed SDFF module integrates structure features into texture features, and meanwhile uses texture features as an auxiliary in generating structure features.
arXiv Detail & Related papers (2022-05-21T12:15:26Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - Unsupervised Image Fusion Method based on Feature Mutual Mapping [16.64607158983448]
We propose an unsupervised adaptive image fusion method to address the above issues.
We construct a global map to measure the connections of pixels between the input source images.
Our method achieves superior performance in both visual perception and objective evaluation.
arXiv Detail & Related papers (2022-01-25T07:50:14Z) - Image Inpainting via Conditional Texture and Structure Dual Generation [26.97159780261334]
We propose a novel two-stream network for image inpainting, which models the structure-constrained texture synthesis and texture-guided structure reconstruction.
To enhance the global consistency, a Bi-directional Gated Feature Fusion (Bi-GFF) module is designed to exchange and combine the structure and texture information.
Experiments on the CelebA, Paris StreetView and Places2 datasets demonstrate the superiority of the proposed method.
arXiv Detail & Related papers (2021-08-22T15:44:37Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.