IRDFusion: Iterative Relation-Map Difference guided Feature Fusion for Multispectral Object Detection
- URL: http://arxiv.org/abs/2509.09085v2
- Date: Mon, 15 Sep 2025 04:36:51 GMT
- Title: IRDFusion: Iterative Relation-Map Difference guided Feature Fusion for Multispectral Object Detection
- Authors: Jifeng Shen, Haibo Zhan, Xin Zuo, Heng Fan, Xiaohui Yuan, Jun Li, Wankou Yang,
- Abstract summary: We propose an innovative feature fusion framework based on cross-modal feature contrastive and screening strategy.<n>The proposed method adaptively enhances salient structures by fusing object-aware complementary cross-modal features.<n>IRDFusion consistently outperforms existing methods across diverse challenging scenarios.
- Score: 23.256601188227865
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current multispectral object detection methods often retain extraneous background or noise during feature fusion, limiting perceptual performance. To address this, we propose an innovative feature fusion framework based on cross-modal feature contrastive and screening strategy, diverging from conventional approaches. The proposed method adaptively enhances salient structures by fusing object-aware complementary cross-modal features while suppressing shared background interference. Our solution centers on two novel, specially designed modules: the Mutual Feature Refinement Module (MFRM) and the Differential Feature Feedback Module (DFFM). The MFRM enhances intra- and inter-modal feature representations by modeling their relationships, thereby improving cross-modal alignment and discriminative power. Inspired by feedback differential amplifiers, the DFFM dynamically computes inter-modal differential features as guidance signals and feeds them back to the MFRM, enabling adaptive fusion of complementary information while suppressing common-mode noise across modalities. To enable robust feature learning, the MFRM and DFFM are integrated into a unified framework, which is formally formulated as an Iterative Relation-Map Differential Guided Feature Fusion mechanism, termed IRDFusion. IRDFusion enables high-quality cross-modal fusion by progressively amplifying salient relational signals through iterative feedback, while suppressing feature noise, leading to significant performance gains. In extensive experiments on FLIR, LLVIP and M$^3$FD datasets, IRDFusion achieves state-of-the-art performance and consistently outperforms existing methods across diverse challenging scenarios, demonstrating its robustness and effectiveness. Code will be available at https://github.com/61s61min/IRDFusion.git.
Related papers
- Residual Cross-Modal Fusion Networks for Audio-Visual Navigation [17.19858148800535]
We propose a Cross-Modal Residual Fusion Network, which introduces residual interactions between audio and visual streams to achieve complementary modeling and fine-grained alignment.<n> Experiments on the Replica and Matterport3D datasets demonstrate that CRFN significantly outperforms state-of-the-art fusion baselines and achieves stronger cross-domain generalization.
arXiv Detail & Related papers (2026-01-11T12:11:36Z) - DIFF-MF: A Difference-Driven Channel-Spatial State Space Model for Multi-Modal Image Fusion [51.07069814578009]
Multi-modal image fusion aims to integrate complementary information from multiple source images to produce high-quality fused images with enriched content.<n>We propose DIFF-MF, a novel difference-driven channel-spatial state space model for multi-modal image fusion.<n>Our method outperforms existing approaches in both visual quality and quantitative evaluation.
arXiv Detail & Related papers (2026-01-09T05:26:54Z) - FreDFT: Frequency Domain Fusion Transformer for Visible-Infrared Object Detection [32.27664742588076]
We propose a frequency domain fusion transformer called FreDFT, for visible-infrared object detection.<n>The proposed approach employs a novel multimodal frequency attention (MFDA) to mine complementary information between modalities and a frequency feed-forward layer.<n>Our proposed FreDFT achieves excellent performance on multiple public datasets compared with other state-of-the-art methods.
arXiv Detail & Related papers (2025-11-13T07:46:18Z) - Residual Prior-driven Frequency-aware Network for Image Fusion [6.90874640835234]
Image fusion aims to integrate complementary information across modalities to generate high-quality fused images.<n>We propose a Residual Prior-driven Frequency-aware Network, termed as RPFNet.
arXiv Detail & Related papers (2025-07-09T10:48:00Z) - Learning to Fuse: Modality-Aware Adaptive Scheduling for Robust Multimodal Foundation Models [0.0]
Modality-Aware Adaptive Fusion Scheduling (MA-AFS) learns to dynamically modulate the contribution of each modality on a per-instance basis.<n>Our work highlights the importance of adaptive fusion and opens a promising direction toward reliable and uncertainty-aware multimodal learning.
arXiv Detail & Related papers (2025-06-15T05:57:45Z) - WIFE-Fusion:Wavelet-aware Intra-inter Frequency Enhancement for Multi-model Image Fusion [8.098063209250684]
Multimodal image fusion effectively aggregates information from diverse modalities.<n>Existing methods often neglect frequency-domain feature exploration and interactive relationships.<n>We propose wavelet-aware Intra-inter Frequency Enhancement Fusion (WIFE-Fusion), a multimodal image fusion framework based on frequency-domain components interactions.
arXiv Detail & Related papers (2025-06-04T04:18:32Z) - PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification [49.37555541088792]
Phase-Amplitude Decoupling (PAD) is a frequency-aware framework that separates phase (modality-shared) and amplitude (modality-complementary) components.<n>This work establishes a new paradigm for physics-aware multi-modal fusion in remote sensing.
arXiv Detail & Related papers (2025-04-27T07:21:42Z) - Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities.
The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance.
We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction.
Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition [46.443866373546726]
We focus on dimensional emotion recognition based on the fusion of facial and vocal modalities extracted from videos.
We propose a joint cross-attention model that relies on the complementary relationships to extract the salient features.
Our proposed A-V fusion model provides a cost-effective solution that can outperform state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-28T14:09:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.