Dual-Domain Perspective on Degradation-Aware Fusion: A VLM-Guided Robust Infrared and Visible Image Fusion Framework
- URL: http://arxiv.org/abs/2509.05000v1
- Date: Fri, 05 Sep 2025 10:48:46 GMT
- Title: Dual-Domain Perspective on Degradation-Aware Fusion: A VLM-Guided Robust Infrared and Visible Image Fusion Framework
- Authors: Tianpei Zhang, Jufeng Zhao, Yiming Zhu, Guangmang Cui,
- Abstract summary: GD2Fusion is a novel framework that integrates vision-language models for degradation perception with dual-domain (frequency/spatial) joint optimization.<n>It achieves superior fusion performance compared with existing algorithms and strategies in dual-source degraded scenarios.
- Score: 9.915632806109555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing infrared-visible image fusion (IVIF) methods assume high-quality inputs, and therefore struggle to handle dual-source degraded scenarios, typically requiring manual selection and sequential application of multiple pre-enhancement steps. This decoupled pre-enhancement-to-fusion pipeline inevitably leads to error accumulation and performance degradation. To overcome these limitations, we propose Guided Dual-Domain Fusion (GD^2Fusion), a novel framework that synergistically integrates vision-language models (VLMs) for degradation perception with dual-domain (frequency/spatial) joint optimization. Concretely, the designed Guided Frequency Modality-Specific Extraction (GFMSE) module performs frequency-domain degradation perception and suppression and discriminatively extracts fusion-relevant sub-band features. Meanwhile, the Guided Spatial Modality-Aggregated Fusion (GSMAF) module carries out cross-modal degradation filtering and adaptive multi-source feature aggregation in the spatial domain to enhance modality complementarity and structural consistency. Extensive qualitative and quantitative experiments demonstrate that GD^2Fusion achieves superior fusion performance compared with existing algorithms and strategies in dual-source degraded scenarios. The code will be publicly released after acceptance of this paper.
Related papers
- MdaIF: Robust One-Stop Multi-Degradation-Aware Image Fusion with Language-Driven Semantics [8.783211177601045]
Infrared and visible image fusion aims to integrate complementary multi-modal information into a single fused result.<n>We propose a one-stop degradation-aware image fusion framework for multi-degradation scenarios driven by a large language model (MdaIF)<n>To adaptively extract diverse weather-aware degradation knowledge and scene feature representations, we employ a pre-trained vision-language model (VLM) in our framework.
arXiv Detail & Related papers (2025-11-16T09:43:12Z) - Coupled Degradation Modeling and Fusion: A VLM-Guided Degradation-Coupled Network for Degradation-Aware Infrared and Visible Image Fusion [9.915632806109555]
We propose a novel VLM-Guided Degradation-Coupled Fusion network (VGDCFusion)<n>Our VGDCFusion significantly outperforms existing state-of-the-art fusion approaches under various degraded image scenarios.
arXiv Detail & Related papers (2025-10-13T14:26:33Z) - Efficient Dual-domain Image Dehazing with Haze Prior Perception [17.18810808188725]
Transformer-based models exhibit strong global modeling capabilities in single-image dehazing, but their high computational cost limits real-time applicability.<n>We propose the Dark Channel Guided Frequency-aware Dehazing Network (DGFDNet), a novel dual-domain framework that performs physically guided degradation alignment.<n>Experiments on four benchmark haze datasets demonstrate that DGFDNet achieves state-of-the-art performance with superior robustness and real-time efficiency.
arXiv Detail & Related papers (2025-07-15T06:56:56Z) - Transformer-Based Dual-Optical Attention Fusion Crowd Head Point Counting and Localization Network [9.214772627896156]
The model designs a dual-optical attention fusion module (DAFP) by introducing complementary information from infrared images.<n>The proposed method outperforms existing techniques in terms of performance, especially in challenging dense low-light scenes.
arXiv Detail & Related papers (2025-05-11T10:55:14Z) - A Fusion-Guided Inception Network for Hyperspectral Image Super-Resolution [4.487807378174191]
We propose a single-image super-resolution model called the Fusion-Guided Inception Network (FGIN)<n>Specifically, we first employ a spectral-spatial fusion module to effectively integrate spectral and spatial information.<n>An Inception-like hierarchical feature extraction strategy is used to capture multiscale spatial dependencies.<n>To further enhance reconstruction quality, we incorporate an optimized upsampling module that combines bilinear with depthwise separable convolutions.
arXiv Detail & Related papers (2025-05-06T11:15:59Z) - PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification [49.37555541088792]
Phase-Amplitude Decoupling (PAD) is a frequency-aware framework that separates phase (modality-shared) and amplitude (modality-complementary) components.<n>This work establishes a new paradigm for physics-aware multi-modal fusion in remote sensing.
arXiv Detail & Related papers (2025-04-27T07:21:42Z) - DDFusion:Degradation-Decoupled Fusion Framework for Robust Infrared and Visible Images Fusion [9.242363983469346]
We propose a Degradation-Decoupled Fusion(DDFusion) framework.<n>DDFusion achieves superior fusion performance under both clean and degraded conditions.
arXiv Detail & Related papers (2025-04-15T05:02:49Z) - DSPFusion: Image Fusion via Degradation and Semantic Dual-Prior Guidance [48.84182709640984]
Existing fusion methods are tailored for high-quality images but struggle with degraded images captured under harsh circumstances.<n>This work presents a textbfDegradation and textbfSemantic textbfPrior dual-guided framework for degraded image textbfFusion (textbfDSPFusion)
arXiv Detail & Related papers (2025-03-30T08:18:50Z) - FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z) - Let Synthetic Data Shine: Domain Reassembly and Soft-Fusion for Single Domain Generalization [68.41367635546183]
Single Domain Generalization aims to train models with consistent performance across diverse scenarios using data from a single source.<n>We propose Discriminative Domain Reassembly and Soft-Fusion (DRSF), a training framework leveraging synthetic data to improve model generalization.
arXiv Detail & Related papers (2025-03-17T18:08:03Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.