Disentangle Object and Non-object Infrared Features via Language Guidance
- URL: http://arxiv.org/abs/2601.09228v1
- Date: Wed, 14 Jan 2026 06:59:54 GMT
- Title: Disentangle Object and Non-object Infrared Features via Language Guidance
- Authors: Fan Liu, Ting Wu, Chuanyi Zhang, Liang Yao, Xing Ma, Yuhui Zheng,
- Abstract summary: We propose a novel vision-language representation learning paradigm for infrared object detection.<n>An additional textual supervision with rich semantic information is explored to guide the disentanglement of object and non-object features.<n>Our approach achieves superior performance on two benchmarks: Mtextsuperscript3FD (83.7% mAP), FLIR (86.1% mAP)
- Score: 35.60538936337868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Infrared object detection focuses on identifying and locating objects in complex environments (\eg, dark, snow, and rain) where visible imaging cameras are disabled by poor illumination. However, due to low contrast and weak edge information in infrared images, it is challenging to extract discriminative object features for robust detection. To deal with this issue, we propose a novel vision-language representation learning paradigm for infrared object detection. An additional textual supervision with rich semantic information is explored to guide the disentanglement of object and non-object features. Specifically, we propose a Semantic Feature Alignment (SFA) module to align the object features with the corresponding text features. Furthermore, we develop an Object Feature Disentanglement (OFD) module that disentangles text-aligned object features and non-object features by minimizing their correlation. Finally, the disentangled object features are entered into the detection head. In this manner, the detection performance can be remarkably enhanced via more discriminative and less noisy features. Extensive experimental results demonstrate that our approach achieves superior performance on two benchmarks: M\textsuperscript{3}FD (83.7\% mAP), FLIR (86.1\% mAP). Our code will be publicly available once the paper is accepted.
Related papers
- Dual-Granularity Semantic Prompting for Language Guidance Infrared Small Target Detection [102.1314414263959]
Infrared small target detection remains challenging due to limited feature representation and severe background interference.<n>We propose DGSPNet, an end-to-end language prompt-driven framework.<n>Our method significantly improves detection accuracy and achieves state-of-the-art performance on three benchmark datasets.
arXiv Detail & Related papers (2025-11-24T16:58:23Z) - Learning to Borrow Features for Improved Detection of Small Objects in Single-Shot Detectors [0.0]
We propose a novel framework that enables small object representations to "borrow" discriminative features from larger, semantically richer instances within the same class.<n>Our approach significantly boosts small object detection accuracy over baseline methods, offering a promising direction for robust object detection in complex visual environments.
arXiv Detail & Related papers (2025-04-30T01:18:33Z) - Toward Realistic Camouflaged Object Detection: Benchmarks and Method [11.279532701331647]
Camouflaged object detection (COD) primarily relies on semantic or instance segmentation methods.<n>We propose a camouflage-aware feature refinement (CAFR) strategy to detect camouflaged objects.<n>CAFR fully utilizes a clear perception of the current object within the prior knowledge of large models to assist detectors in deeply understanding the distinctions between background and foreground.
arXiv Detail & Related papers (2025-01-13T13:04:00Z) - SurANet: Surrounding-Aware Network for Concealed Object Detection via Highly-Efficient Interactive Contrastive Learning Strategy [55.570183323356964]
We propose a novel Surrounding-Aware Network, namely SurANet, for concealed object detection.
We enhance the semantics of feature maps using differential fusion of surrounding features to highlight concealed objects.
Next, a Surrounding-Aware Contrastive Loss is applied to identify the concealed object via learning surrounding feature maps contrastively.
arXiv Detail & Related papers (2024-10-09T13:02:50Z) - DPDETR: Decoupled Position Detection Transformer for Infrared-Visible Object Detection [57.08921921586688]
Infrared-visible object detection aims to achieve robust object detection by leveraging the complementary information of infrared and visible image pairs.<n> fusing misalignment complementary features is difficult, and current methods cannot reliably locate objects in both modalities under misalignment conditions.<n>We propose a Decoupled Position Detection Transformer to address these issues.<n> Experiments on DroneVehicle and KAIST datasets demonstrate significant improvements compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-08-12T13:05:43Z) - Few-shot Oriented Object Detection with Memorable Contrastive Learning in Remote Sensing Images [11.217630579076237]
Few-shot object detection (FSOD) has garnered significant research attention in the field of remote sensing.
We propose a novel FSOD method for remote sensing images called Few-shot Oriented object detection with Memorable Contrastive learning (FOMC)
Specifically, we employ oriented bounding boxes instead of traditional horizontal bounding boxes to learn a better feature representation for arbitrary-oriented aerial objects.
arXiv Detail & Related papers (2024-03-20T08:15:18Z) - The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding [8.448399308205266]
We introduce an evaluation protocol based on dynamic vocabulary generation to test whether models detect, discern, and assign the correct fine-grained description to objects.
We further enhance our investigation by evaluating several state-of-the-art open-vocabulary object detectors using the proposed protocol.
arXiv Detail & Related papers (2023-11-29T10:40:52Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - High-resolution Iterative Feedback Network for Camouflaged Object
Detection [128.893782016078]
Spotting camouflaged objects that are visually assimilated into the background is tricky for object detection algorithms.
We aim to extract the high-resolution texture details to avoid the detail degradation that causes blurred vision in edges and boundaries.
We introduce a novel HitNet to refine the low-resolution representations by high-resolution features in an iterative feedback manner.
arXiv Detail & Related papers (2022-03-22T11:20:21Z) - Few-shot Object Detection with Self-adaptive Attention Network for
Remote Sensing Images [11.938537194408669]
We propose a few-shot object detector which is designed for detecting novel objects provided with only a few examples.
In order to fit the object detection settings, our proposed few-shot detector concentrates on the relations that lie in the level of objects instead of the full image.
The experiments demonstrate the effectiveness of the proposed method in few-shot scenes.
arXiv Detail & Related papers (2020-09-26T13:44:58Z) - Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud
Object Detection [64.2159881697615]
Object detection from 3D point clouds remains a challenging task, though recent studies pushed the envelope with the deep learning techniques.
We propose a domain adaptation like approach to enhance the robustness of the feature representation.
Our simple yet effective approach fundamentally boosts the performance of 3D point cloud object detection and achieves the state-of-the-art results.
arXiv Detail & Related papers (2020-06-08T05:15:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.