WiSE-OD: Benchmarking Robustness in Infrared Object Detection
- URL: http://arxiv.org/abs/2507.18925v1
- Date: Fri, 25 Jul 2025 03:33:50 GMT
- Title: WiSE-OD: Benchmarking Robustness in Infrared Object Detection
- Authors: Heitor R. Medeiros, Atif Belal, Masih Aminbeidokhti, Eric Granger, Marco Pedersoli,
- Abstract summary: WiSE-OD is a weight-space ensembling method with two variants: WiSE-OD$_ZS$, which combines RGB zero-shot and IR fine-tuned weights, and WiSE-OD$_LP$, which blends zero-shot and linear probing.<n>We introduce LLVIP-C and FLIR-C, two cross-modality out-of-distribution benchmarks built by applying corruption to standard IR datasets.
- Score: 12.115815831689265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Object detection (OD) in infrared (IR) imagery is critical for low-light and nighttime applications. However, the scarcity of large-scale IR datasets forces models to rely on weights pre-trained on RGB images. While fine-tuning on IR improves accuracy, it often compromises robustness under distribution shifts due to the inherent modality gap between RGB and IR. To address this, we introduce LLVIP-C and FLIR-C, two cross-modality out-of-distribution (OOD) benchmarks built by applying corruption to standard IR datasets. Additionally, to fully leverage the complementary knowledge from RGB and infrared trained models, we propose WiSE-OD, a weight-space ensembling method with two variants: WiSE-OD$_{ZS}$, which combines RGB zero-shot and IR fine-tuned weights, and WiSE-OD$_{LP}$, which blends zero-shot and linear probing. Evaluated across three RGB-pretrained detectors and two robust baselines, WiSE-OD improves both cross-modality and corruption robustness without any additional training or inference cost.
Related papers
- Bringing RGB and IR Together: Hierarchical Multi-Modal Enhancement for Robust Transmission Line Detection [67.02804741856512]
We propose a novel Hierarchical Multi-Modal Enhancement Network (HMMEN) that integrates RGB and IR data for robust and accurate TL detection.<n>Our method introduces two key components: (1) a Mutual Multi-Modal Enhanced Block (MMEB), which fuses and enhances hierarchical RGB and IR feature maps in a coarse-to-fine manner, and (2) a Feature Alignment Block (FAB) that corrects misalignments between decoder outputs and IR feature maps by leveraging deformable convolutions.
arXiv Detail & Related papers (2025-01-25T06:21:06Z) - Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution [54.293362972473595]
Image super-resolution (SR) aims to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts.
Current approaches to address SR tasks are either dedicated to extracting RGB image features or assuming similar degradation patterns.
We propose a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity.
arXiv Detail & Related papers (2024-11-19T14:24:03Z) - The Solution for the GAIIC2024 RGB-TIR object detection Challenge [5.625794757504552]
RGB-TIR object detection aims to utilize both RGB and TIR images for complementary information during detection.
Our proposed method achieved an mAP score of 0.516 and 0.543 on A and B benchmarks respectively.
arXiv Detail & Related papers (2024-07-04T12:08:36Z) - SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised
Learning for Robust Infrared Small Target Detection [53.19618419772467]
Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds.
With the development of Transformer, the scale of SIRST models is constantly increasing.
With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed.
arXiv Detail & Related papers (2024-03-08T16:14:54Z) - Tensor Factorization for Leveraging Cross-Modal Knowledge in
Data-Constrained Infrared Object Detection [22.60228799622782]
Key bottleneck in object detection in IR images is lack of sufficient labeled training data.
We seek to leverage cues from the RGB modality to scale object detectors to the IR modality, while preserving model performance in the RGB modality.
We first pretrain these factor matrices on the RGB modality, for which plenty of training data are assumed to exist and then augment only a few trainable parameters for training on the IR modality to avoid over-fitting.
arXiv Detail & Related papers (2023-09-28T16:55:52Z) - $\mathbf{C}^2$Former: Calibrated and Complementary Transformer for
RGB-Infrared Object Detection [18.27510863075184]
We propose a novel Calibrated and Complementary Transformer called $mathrmC2$Former to address modality miscalibration and imprecision problems.
Because $mathrmC2$Former performs in the feature domain, it can be embedded into existed RGB-IR object detectors via the backbone network.
arXiv Detail & Related papers (2023-06-28T12:52:48Z) - DiffIR: Efficient Diffusion Model for Image Restoration [108.82579440308267]
Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network.
Traditional DMs running massive iterations on a large model to estimate whole images or feature maps is inefficient for image restoration.
We propose DiffIR, which consists of a compact IR prior extraction network (CPEN), dynamic IR transformer (DIRformer), and denoising network.
arXiv Detail & Related papers (2023-03-16T16:47:14Z) - Translation, Scale and Rotation: Cross-Modal Alignment Meets
RGB-Infrared Vehicle Detection [10.460296317901662]
We find detection in aerial RGB-IR images suffers from cross-modal weakly misalignment problems.
We propose a Translation-Scale-Rotation Alignment (TSRA) module to address the problem.
A two-stream feature alignment detector (TSFADet) based on the TSRA module is constructed for RGB-IR object detection in aerial images.
arXiv Detail & Related papers (2022-09-28T03:06:18Z) - DUT-LFSaliency: Versatile Dataset and Light Field-to-RGB Saliency
Detection [104.50425501764806]
We introduce a large-scale dataset to enable versatile applications for light field saliency detection.
We present an asymmetrical two-stream model consisting of the Focal stream and RGB stream.
Experiments demonstrate that our Focal stream achieves state-of-the-arts performance.
arXiv Detail & Related papers (2020-12-30T11:53:27Z) - Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient
Object Detection [73.31632581915201]
We propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction.
A newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D.
arXiv Detail & Related papers (2020-08-07T10:13:05Z) - Synergistic saliency and depth prediction for RGB-D saliency detection [76.27406945671379]
Existing RGB-D saliency datasets are small, which may lead to overfitting and limited generalization for diverse scenarios.
We propose a semi-supervised system for RGB-D saliency detection that can be trained on smaller RGB-D saliency datasets without saliency ground truth.
arXiv Detail & Related papers (2020-07-03T14:24:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.