Multispectral Detection Transformer with Infrared-Centric Feature Fusion
- URL: http://arxiv.org/abs/2505.15137v2
- Date: Mon, 14 Jul 2025 11:40:47 GMT
- Title: Multispectral Detection Transformer with Infrared-Centric Feature Fusion
- Authors: Seongmin Hwang, Daeyoung Han, Moongu Jeon,
- Abstract summary: Infrared-Centric Fusion (IC-Fusion) is a lightweight and modality-aware sensor fusion method.<n>IC-Fusion prioritizes infrared features while effectively integrating complementary RGB semantic context.<n> Experiments on the FLIR and LLVIP benchmarks demonstrate the superior effectiveness and efficiency of our IR-centric fusion strategy.
- Score: 8.762314897895175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multispectral object detection aims to leverage complementary information from visible (RGB) and infrared (IR) modalities to enable robust performance under diverse environmental conditions. Our key insight, derived from wavelet analysis and empirical observations, is that IR images contain structurally rich high-frequency information critical for object detection, making an infrared-centric approach highly effective. To capitalize on this finding, we propose Infrared-Centric Fusion (IC-Fusion), a lightweight and modality-aware sensor fusion method that prioritizes infrared features while effectively integrating complementary RGB semantic context. IC-Fusion adopts a compact RGB backbone and designs a novel fusion module comprising a Multi-Scale Feature Distillation (MSFD) block to enhance RGB features and a three-stage fusion block with a Cross-Modal Channel Shuffle Gate (CCSG), a Cross-Modal Large Kernel Gate (CLKG), and a Channel Shuffle Projection (CSP) to facilitate effective cross-modal interaction. Experiments on the FLIR and LLVIP benchmarks demonstrate the superior effectiveness and efficiency of our IR-centric fusion strategy, further validating its benefits. Our code is available at https://github.com/smin-hwang/IC-Fusion.
Related papers
- A High-Performance Thermal Infrared Object Detection Framework with Centralized Regulation [5.935808994536907]
We present a novel and efficient thermal object detection framework, known as CRT-YOLO, that is based on centralized feature regulation.<n>Our proposed model integrates efficient multi-scale infrared attention modules, which adeptly capture long-range infrared.<n>Experiments conducted on two benchmark datasets demonstrate that our CRT-YOLO model significantly outperforms conventional methods.
arXiv Detail & Related papers (2025-05-16T03:43:24Z) - Bringing RGB and IR Together: Hierarchical Multi-Modal Enhancement for Robust Transmission Line Detection [67.02804741856512]
We propose a novel Hierarchical Multi-Modal Enhancement Network (HMMEN) that integrates RGB and IR data for robust and accurate TL detection.<n>Our method introduces two key components: (1) a Mutual Multi-Modal Enhanced Block (MMEB), which fuses and enhances hierarchical RGB and IR feature maps in a coarse-to-fine manner, and (2) a Feature Alignment Block (FAB) that corrects misalignments between decoder outputs and IR feature maps by leveraging deformable convolutions.
arXiv Detail & Related papers (2025-01-25T06:21:06Z) - Removal then Selection: A Coarse-to-Fine Fusion Perspective for RGB-Infrared Object Detection [20.12812979315803]
Object detection utilizing both visible (RGB) and thermal infrared (IR) imagery has garnered extensive attention.
Most existing multi-modal object detection methods directly input the RGB and IR images into deep neural networks.
We propose a novel coarse-to-fine perspective to purify and fuse features from both modalities.
arXiv Detail & Related papers (2024-01-19T14:49:42Z) - RXFOOD: Plug-in RGB-X Fusion for Object of Interest Detection [22.53413063906737]
A crucial part in two-branch RGB-X deep neural networks is how to fuse information across modalities.
We propose RXFOOD for the fusion of features across different scales within the same modality branch and from different modality branches simultaneously.
Experimental results on RGB-NIR salient object detection, RGB-D salient object detection, and RGBFrequency image manipulation detection demonstrate the clear effectiveness of the proposed RXFOOD.
arXiv Detail & Related papers (2023-06-22T01:27:00Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - Interactive Context-Aware Network for RGB-T Salient Object Detection [7.544240329265388]
We propose a novel network called Interactive Context-Aware Network (ICANet)
ICANet contains three modules that can effectively perform the cross-modal and cross-scale fusions.
Experiments prove that our network performs favorably against the state-of-the-art RGB-T SOD methods.
arXiv Detail & Related papers (2022-11-11T10:04:36Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - Infrared Small-Dim Target Detection with Transformer under Complex
Backgrounds [155.388487263872]
We propose a new infrared small-dim target detection method with the transformer.
We adopt the self-attention mechanism of the transformer to learn the interaction information of image features in a larger range.
We also design a feature enhancement module to learn more features of small-dim targets.
arXiv Detail & Related papers (2021-09-29T12:23:41Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Siamese Network for RGB-D Salient Object Detection and Beyond [113.30063105890041]
A novel framework is proposed to learn from both RGB and depth inputs through a shared network backbone.
Comprehensive experiments using five popular metrics show that the designed framework yields a robust RGB-D saliency detector.
We also link JL-DCF to the RGB-D semantic segmentation field, showing its capability of outperforming several semantic segmentation models.
arXiv Detail & Related papers (2020-08-26T06:01:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.