Related papers: Dual-Domain Perspective on Degradation-Aware Fusion: A VLM-Guided Robust Infrared and Visible Image Fusion Framework

Dual-Domain Perspective on Degradation-Aware Fusion: A VLM-Guided Robust Infrared and Visible Image Fusion Framework

URL: http://arxiv.org/abs/2509.05000v1
Date: Fri, 05 Sep 2025 10:48:46 GMT
Title: Dual-Domain Perspective on Degradation-Aware Fusion: A VLM-Guided Robust Infrared and Visible Image Fusion Framework
Authors: Tianpei Zhang, Jufeng Zhao, Yiming Zhu, Guangmang Cui,
Abstract summary: GD2Fusion is a novel framework that integrates vision-language models for degradation perception with dual-domain (frequency/spatial) joint optimization.<n>It achieves superior fusion performance compared with existing algorithms and strategies in dual-source degraded scenarios.
Score: 9.915632806109555
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most existing infrared-visible image fusion (IVIF) methods assume high-quality inputs, and therefore struggle to handle dual-source degraded scenarios, typically requiring manual selection and sequential application of multiple pre-enhancement steps. This decoupled pre-enhancement-to-fusion pipeline inevitably leads to error accumulation and performance degradation. To overcome these limitations, we propose Guided Dual-Domain Fusion (GD^2Fusion), a novel framework that synergistically integrates vision-language models (VLMs) for degradation perception with dual-domain (frequency/spatial) joint optimization. Concretely, the designed Guided Frequency Modality-Specific Extraction (GFMSE) module performs frequency-domain degradation perception and suppression and discriminatively extracts fusion-relevant sub-band features. Meanwhile, the Guided Spatial Modality-Aggregated Fusion (GSMAF) module carries out cross-modal degradation filtering and adaptive multi-source feature aggregation in the spatial domain to enhance modality complementarity and structural consistency. Extensive qualitative and quantitative experiments demonstrate that GD^2Fusion achieves superior fusion performance compared with existing algorithms and strategies in dual-source degraded scenarios. The code will be publicly released after acceptance of this paper.

Related papers

MdaIF: Robust One-Stop Multi-Degradation-Aware Image Fusion with Language-Driven Semantics [8.783211177601045]
Infrared and visible image fusion aims to integrate complementary multi-modal information into a single fused result.<n>We propose a one-stop degradation-aware image fusion framework for multi-degradation scenarios driven by a large language model (MdaIF)<n>To adaptively extract diverse weather-aware degradation knowledge and scene feature representations, we employ a pre-trained vision-language model (VLM) in our framework.
arXiv Detail & Related papers (2025-11-16T09:43:12Z)
Coupled Degradation Modeling and Fusion: A VLM-Guided Degradation-Coupled Network for Degradation-Aware Infrared and Visible Image Fusion [9.915632806109555]
We propose a novel VLM-Guided Degradation-Coupled Fusion network (VGDCFusion)<n>Our VGDCFusion significantly outperforms existing state-of-the-art fusion approaches under various degraded image scenarios.
arXiv Detail & Related papers (2025-10-13T14:26:33Z)
Efficient Dual-domain Image Dehazing with Haze Prior Perception [17.18810808188725]
Transformer-based models exhibit strong global modeling capabilities in single-image dehazing, but their high computational cost limits real-time applicability.<n>We propose the Dark Channel Guided Frequency-aware Dehazing Network (DGFDNet), a novel dual-domain framework that performs physically guided degradation alignment.<n>Experiments on four benchmark haze datasets demonstrate that DGFDNet achieves state-of-the-art performance with superior robustness and real-time efficiency.
arXiv Detail & Related papers (2025-07-15T06:56:56Z)
Transformer-Based Dual-Optical Attention Fusion Crowd Head Point Counting and Localization Network [9.214772627896156]
The model designs a dual-optical attention fusion module (DAFP) by introducing complementary information from infrared images.<n>The proposed method outperforms existing techniques in terms of performance, especially in challenging dense low-light scenes.
arXiv Detail & Related papers (2025-05-11T10:55:14Z)
A Fusion-Guided Inception Network for Hyperspectral Image Super-Resolution [4.487807378174191]
We propose a single-image super-resolution model called the Fusion-Guided Inception Network (FGIN)<n>Specifically, we first employ a spectral-spatial fusion module to effectively integrate spectral and spatial information.<n>An Inception-like hierarchical feature extraction strategy is used to capture multiscale spatial dependencies.<n>To further enhance reconstruction quality, we incorporate an optimized upsampling module that combines bilinear with depthwise separable convolutions.
arXiv Detail & Related papers (2025-05-06T11:15:59Z)
PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification [49.37555541088792]
Phase-Amplitude Decoupling (PAD) is a frequency-aware framework that separates phase (modality-shared) and amplitude (modality-complementary) components.<n>This work establishes a new paradigm for physics-aware multi-modal fusion in remote sensing.
arXiv Detail & Related papers (2025-04-27T07:21:42Z)
DDFusion:Degradation-Decoupled Fusion Framework for Robust Infrared and Visible Images Fusion [9.242363983469346]
We propose a Degradation-Decoupled Fusion(DDFusion) framework.<n>DDFusion achieves superior fusion performance under both clean and degraded conditions.
arXiv Detail & Related papers (2025-04-15T05:02:49Z)
DSPFusion: Image Fusion via Degradation and Semantic Dual-Prior Guidance [48.84182709640984]
Existing fusion methods are tailored for high-quality images but struggle with degraded images captured under harsh circumstances.<n>This work presents a textbfDegradation and textbfSemantic textbfPrior dual-guided framework for degraded image textbfFusion (textbfDSPFusion)
arXiv Detail & Related papers (2025-03-30T08:18:50Z)
FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z)
Let Synthetic Data Shine: Domain Reassembly and Soft-Fusion for Single Domain Generalization [68.41367635546183]
Single Domain Generalization aims to train models with consistent performance across diverse scenarios using data from a single source.<n>We propose Discriminative Domain Reassembly and Soft-Fusion (DRSF), a training framework leveraging synthetic data to improve model generalization.
arXiv Detail & Related papers (2025-03-17T18:08:03Z)
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem. By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts. Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z)
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z)
Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations. Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.