Contrast-Guided Cross-Modal Distillation for Thermal Object Detection
- URL: http://arxiv.org/abs/2511.01435v1
- Date: Mon, 03 Nov 2025 10:38:01 GMT
- Title: Contrast-Guided Cross-Modal Distillation for Thermal Object Detection
- Authors: SiWoo Kim, JhongHyun An,
- Abstract summary: Low contrast and weak high-frequency cues lead to duplicate, overlapping boxes, missed small objects, and class confusion.<n>We introduce training-only objectives that sharpen instance-level decision boundaries by pulling together features of the same class.<n>In experiments, our method outperformed prior approaches and achieved state-of-the-art performance.
- Score: 1.8477401359673709
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust perception at night remains challenging for thermal-infrared detection: low contrast and weak high-frequency cues lead to duplicate, overlapping boxes, missed small objects, and class confusion. Prior remedies either translate TIR to RGB and hope pixel fidelity transfers to detection -- making performance fragile to color or structure artifacts -- or fuse RGB and TIR at test time, which requires extra sensors, precise calibration, and higher runtime cost. Both lines can help in favorable conditions, but do not directly shape the thermal representation used by the detector. We keep mono-modality inference and tackle the root causes during training. Specifically, we introduce training-only objectives that sharpen instance-level decision boundaries by pulling together features of the same class and pushing apart those of different classes -- suppressing duplicate and confusing detections -- and that inject cross-modal semantic priors by aligning the student's multi-level pyramid features with an RGB-trained teacher, thereby strengthening texture-poor thermal features without visible input at test time. In experiments, our method outperformed prior approaches and achieved state-of-the-art performance.
Related papers
- Modality-Decoupled RGB-Thermal Object Detector via Query Fusion [15.717929078660227]
We propose a Modality-Decoupled RGB-T detection framework with Query Fusion (MDQF) to balance modality complementation and separation.<n>Our approach delivers superior performance to existing RGB-T detectors and achieves better modality independence.
arXiv Detail & Related papers (2026-01-13T11:32:29Z) - IrisNet: Infrared Image Status Awareness Meta Decoder for Infrared Small Targets Detection [92.56025546608699]
IrisNet is a novel meta-learned framework that adapts detection strategies to the input infrared image status.<n>Our approach establishes a dynamic mapping between infrared image features and entire decoder parameters.<n> Experiments on NUDT-SIRST, NUAA-SIRST, and IRSTD-1K datasets demonstrate the superiority of our IrisNet.
arXiv Detail & Related papers (2025-11-25T13:53:54Z) - Lightweight Facial Landmark Detection in Thermal Images via Multi-Level Cross-Modal Knowledge Transfer [13.887803692033073]
Facial Landmark Detection in thermal imagery is critical for applications in challenging lighting conditions.<n>We propose a novel framework that decouples high-fidelity RGB-to-thermal knowledge transfer from model compression.<n> Experiments show that our approach sets a new state-of-the-art on public thermal FLD benchmarks, notably outperforming previous methods.
arXiv Detail & Related papers (2025-10-13T08:19:56Z) - Detection-Friendly Nonuniformity Correction: A Union Framework for Infrared UAVTarget Detection [18.776245480405958]
Infrared unmanned aerial vehicle (UAV) images captured using thermal detectors are often affected by temperature dependent lowfrequency nonuniformity.<n>We present a detection-friendly union framework that simultaneously addresses both infrared and UAV target detection tasks.<n>We introduce a detection-guided self-supervised loss to reduce feature discrepancies between the two tasks, thereby enhancing detection robustness to varying nonuniformity levels.
arXiv Detail & Related papers (2025-04-05T01:29:22Z) - Bringing RGB and IR Together: Hierarchical Multi-Modal Enhancement for Robust Transmission Line Detection [67.02804741856512]
We propose a novel Hierarchical Multi-Modal Enhancement Network (HMMEN) that integrates RGB and IR data for robust and accurate TL detection.<n>Our method introduces two key components: (1) a Mutual Multi-Modal Enhanced Block (MMEB), which fuses and enhances hierarchical RGB and IR feature maps in a coarse-to-fine manner, and (2) a Feature Alignment Block (FAB) that corrects misalignments between decoder outputs and IR feature maps by leveraging deformable convolutions.
arXiv Detail & Related papers (2025-01-25T06:21:06Z) - Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution [54.293362972473595]
Image super-resolution (SR) aims to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts.
Current approaches to address SR tasks are either dedicated to extracting RGB image features or assuming similar degradation patterns.
We propose a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity.
arXiv Detail & Related papers (2024-11-19T14:24:03Z) - SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised
Learning for Robust Infrared Small Target Detection [53.19618419772467]
Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds.
With the development of Transformer, the scale of SIRST models is constantly increasing.
With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed.
arXiv Detail & Related papers (2024-03-08T16:14:54Z) - Long-Tailed 3D Detection via Multi-Modal Fusion [58.89765900064689]
We study the problem of Long-Tailed 3D Detection (LT3D), which evaluates all annotated classes, including those in-the-tail.<n>We point out that rare-class accuracy is particularly improved via multi-modal late fusion (MMLF) of independently trained uni-modal LiDAR and RGB detectors.<n>Our MMLF significantly outperforms prior work for LT3D, particularly improving on the six rarest classes from 12.8 to 20.0 mAP!
arXiv Detail & Related papers (2023-12-18T07:14:25Z) - Tensor Factorization for Leveraging Cross-Modal Knowledge in
Data-Constrained Infrared Object Detection [22.60228799622782]
Key bottleneck in object detection in IR images is lack of sufficient labeled training data.
We seek to leverage cues from the RGB modality to scale object detectors to the IR modality, while preserving model performance in the RGB modality.
We first pretrain these factor matrices on the RGB modality, for which plenty of training data are assumed to exist and then augment only a few trainable parameters for training on the IR modality to avoid over-fitting.
arXiv Detail & Related papers (2023-09-28T16:55:52Z) - ReDFeat: Recoupling Detection and Description for Multimodal Feature
Learning [51.07496081296863]
We recouple independent constraints of detection and description of multimodal feature learning with a mutual weighting strategy.
We propose a detector that possesses a large receptive field and is equipped with learnable non-maximum suppression layers.
We build a benchmark that contains cross visible, infrared, near-infrared and synthetic aperture radar image pairs for evaluating the performance of features in feature matching and image registration tasks.
arXiv Detail & Related papers (2022-05-16T04:24:22Z) - Learning Enriched Illuminants for Cross and Single Sensor Color
Constancy [182.4997117953705]
We propose cross-sensor self-supervised training to train the network.
We train the network by randomly sampling the artificial illuminants in a sensor-independent manner.
Experiments show that our cross-sensor model and single-sensor model outperform other state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-03-21T15:45:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.