Related papers: WIFE-Fusion:Wavelet-aware Intra-inter Frequency Enhancement for Multi-model Image Fusion

WIFE-Fusion:Wavelet-aware Intra-inter Frequency Enhancement for Multi-model Image Fusion

URL: http://arxiv.org/abs/2506.03555v1
Date: Wed, 04 Jun 2025 04:18:32 GMT
Title: WIFE-Fusion:Wavelet-aware Intra-inter Frequency Enhancement for Multi-model Image Fusion
Authors: Tianpei Zhang, Jufeng Zhao, Yiming Zhu, Guangmang Cui,
Abstract summary: Multimodal image fusion effectively aggregates information from diverse modalities.<n>Existing methods often neglect frequency-domain feature exploration and interactive relationships.<n>We propose wavelet-aware Intra-inter Frequency Enhancement Fusion (WIFE-Fusion), a multimodal image fusion framework based on frequency-domain components interactions.
Score: 8.098063209250684
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal image fusion effectively aggregates information from diverse modalities, with fused images playing a crucial role in vision systems. However, existing methods often neglect frequency-domain feature exploration and interactive relationships. In this paper, we propose wavelet-aware Intra-inter Frequency Enhancement Fusion (WIFE-Fusion), a multimodal image fusion framework based on frequency-domain components interactions. Its core innovations include: Intra-Frequency Self-Attention (IFSA) that leverages inherent cross-modal correlations and complementarity through interactive self-attention mechanisms to extract enriched frequency-domain features, and Inter-Frequency Interaction (IFI) that enhances enriched features and filters latent features via combinatorial interactions between heterogeneous frequency-domain components across modalities. These processes achieve precise source feature extraction and unified modeling of feature extraction-aggregation. Extensive experiments on five datasets across three multimodal fusion tasks demonstrate WIFE-Fusion's superiority over current specialized and unified fusion methods. Our code is available at https://github.com/Lmmh058/WIFE-Fusion.

Related papers

Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion [69.13852939945433]
Multi-Modal Image Fusion (MMIF) aims to combine images from different modalities to produce fused images.<n>We propose a novel Interactive Spatial-Frequency Fusion Mamba framework for MMIF.<n>Our ISFM can achieve better performances than other state-of-the-art methods.
arXiv Detail & Related papers (2026-02-04T10:35:55Z)
FreDFT: Frequency Domain Fusion Transformer for Visible-Infrared Object Detection [32.27664742588076]
We propose a frequency domain fusion transformer called FreDFT, for visible-infrared object detection.<n>The proposed approach employs a novel multimodal frequency attention (MFDA) to mine complementary information between modalities and a frequency feed-forward layer.<n>Our proposed FreDFT achieves excellent performance on multiple public datasets compared with other state-of-the-art methods.
arXiv Detail & Related papers (2025-11-13T07:46:18Z)
IRDFusion: Iterative Relation-Map Difference guided Feature Fusion for Multispectral Object Detection [23.256601188227865]
We propose an innovative feature fusion framework based on cross-modal feature contrastive and screening strategy.<n>The proposed method adaptively enhances salient structures by fusing object-aware complementary cross-modal features.<n>IRDFusion consistently outperforms existing methods across diverse challenging scenarios.
arXiv Detail & Related papers (2025-09-11T01:22:35Z)
Task-Generalized Adaptive Cross-Domain Learning for Multimodal Image Fusion [15.666336202108862]
Multimodal Image Fusion (MMIF) aims to integrate complementary information from different imaging modalities to overcome the limitations of individual sensors.<n>Current MMIF methods face challenges such as modality misalignment, high-frequency detail destruction, and task-specific limitations.<n>We propose AdaSFFuse, a novel framework for task-generalized MMIF through adaptive cross-domain co-fusion learning.
arXiv Detail & Related papers (2025-08-21T12:31:14Z)
PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification [49.37555541088792]
Phase-Amplitude Decoupling (PAD) is a frequency-aware framework that separates phase (modality-shared) and amplitude (modality-complementary) components.<n>This work establishes a new paradigm for physics-aware multi-modal fusion in remote sensing.
arXiv Detail & Related papers (2025-04-27T07:21:42Z)
Multimodal-Aware Fusion Network for Referring Remote Sensing Image Segmentation [7.992331117310217]
Referring remote sensing image segmentation (RRSIS) is a novel visual task in remote sensing images segmentation.<n>We design a multimodal-aware fusion network (MAFN) to achieve fine-grained alignment and fusion between the two modalities.
arXiv Detail & Related papers (2025-03-14T08:31:21Z)
A Dual Domain Multi-exposure Image Fusion Network based on the Spatial-Frequency Integration [57.14745782076976]
Multi-exposure image fusion aims to generate a single high-dynamic image by integrating images with different exposures. We propose a novelty perspective on multi-exposure image fusion via the Spatial-Frequency Integration Framework, named MEF-SFI. Our method achieves visual-appealing fusion results against state-of-the-art multi-exposure image fusion approaches.
arXiv Detail & Related papers (2023-12-17T04:45:15Z)
AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential Cross Attention [6.910879180358217]
We propose AdaFuse, in which multimodal image information is fused adaptively through frequency-guided attention mechanism. The proposed method outperforms state-of-the-art methods in terms of both visual quality and quantitative metrics.
arXiv Detail & Related papers (2023-10-09T07:10:30Z)
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem. By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts. Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z)
Mutual-Guided Dynamic Network for Image Fusion [51.615598671899335]
We propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs. Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks.
arXiv Detail & Related papers (2023-08-24T03:50:37Z)
An Interactively Reinforced Paradigm for Joint Infrared-Visible Image Fusion and Saliency Object Detection [59.02821429555375]
This research focuses on the discovery and localization of hidden objects in the wild and serves unmanned systems. Through empirical analysis, infrared and visible image fusion (IVIF) enables hard-to-find objects apparent. multimodal salient object detection (SOD) accurately delineates the precise spatial location of objects within the picture.
arXiv Detail & Related papers (2023-05-17T06:48:35Z)
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z)
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition [46.443866373546726]
We focus on dimensional emotion recognition based on the fusion of facial and vocal modalities extracted from videos. We propose a joint cross-attention model that relies on the complementary relationships to extract the salient features. Our proposed A-V fusion model provides a cost-effective solution that can outperform state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-28T14:09:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.