Unleashing the Power of Generic Segmentation Models: A Simple Baseline for Infrared Small Target Detection
- URL: http://arxiv.org/abs/2409.04714v1
- Date: Sat, 7 Sep 2024 05:31:24 GMT
- Title: Unleashing the Power of Generic Segmentation Models: A Simple Baseline for Infrared Small Target Detection
- Authors: Mingjin Zhang, Chi Zhang, Qiming Zhang, Yunsong Li, Xinbo Gao, Jing Zhang,
- Abstract summary: We investigate the adaptation of generic segmentation models, such as the Segment Anything Model (SAM), to infrared small object detection tasks.
Our model demonstrates significantly improved performance in both accuracy and throughput compared to existing approaches.
- Score: 57.666055329221194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in deep learning have greatly advanced the field of infrared small object detection (IRSTD). Despite their remarkable success, a notable gap persists between these IRSTD methods and generic segmentation approaches in natural image domains. This gap primarily arises from the significant modality differences and the limited availability of infrared data. In this study, we aim to bridge this divergence by investigating the adaptation of generic segmentation models, such as the Segment Anything Model (SAM), to IRSTD tasks. Our investigation reveals that many generic segmentation models can achieve comparable performance to state-of-the-art IRSTD methods. However, their full potential in IRSTD remains untapped. To address this, we propose a simple, lightweight, yet effective baseline model for segmenting small infrared objects. Through appropriate distillation strategies, we empower smaller student models to outperform state-of-the-art methods, even surpassing fine-tuned teacher results. Furthermore, we enhance the model's performance by introducing a novel query design comprising dense and sparse queries to effectively encode multi-scale features. Through extensive experimentation across four popular IRSTD datasets, our model demonstrates significantly improved performance in both accuracy and throughput compared to existing approaches, surpassing SAM and Semantic-SAM by over 14 IoU on NUDT and 4 IoU on IRSTD1k. The source code and models will be released at https://github.com/O937-blip/SimIR.
Related papers
- Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector [97.92369017531038]
We build a new laRge-scale Adervsarial images dataset with Diverse hArmful Responses (RADAR)
We then develop a novel iN-time Embedding-based AdveRSarial Image DEtection (NEARSIDE) method, which exploits a single vector that distilled from the hidden states of Visual Language Models (VLMs) to achieve the detection of adversarial images against benign ones in the input.
arXiv Detail & Related papers (2024-10-30T10:33:10Z) - Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image
Synthesis [7.234618871984921]
An emerging area of research aims to learn deep generative models with limited training data.
We propose RS-IMLE, a novel approach that changes the prior distribution used for training.
This leads to substantially higher quality image generation compared to existing GAN and IMLE-based methods.
arXiv Detail & Related papers (2024-09-26T00:19:42Z) - One Shot is Enough for Sequential Infrared Small Target Segmentation [9.354927663020586]
Infrared small target sequences exhibit strong similarities between frames and contain rich contextual information.
We propose a one-shot and training-free method that perfectly adapts SAM's zero-shot generalization capability to sequential IRSTS.
Experiments demonstrate that our method requires only one shot to achieve comparable performance to state-of-the-art IRSTS methods.
arXiv Detail & Related papers (2024-08-09T02:36:56Z) - IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection [55.554484379021524]
Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a notable domain gap between natural and infrared images.
We propose the IRSAM model for IRSTD, which improves SAM's encoder-decoder architecture to learn better feature representation of infrared small objects.
arXiv Detail & Related papers (2024-07-10T10:17:57Z) - SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised
Learning for Robust Infrared Small Target Detection [53.19618419772467]
Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds.
With the development of Transformer, the scale of SIRST models is constantly increasing.
With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed.
arXiv Detail & Related papers (2024-03-08T16:14:54Z) - Black-box Adversarial Attacks against Dense Retrieval Models: A
Multi-view Contrastive Learning Method [115.29382166356478]
We introduce the adversarial retrieval attack (AREA) task.
It is meant to trick DR models into retrieving a target document that is outside the initial set of candidate documents retrieved by the DR model.
We find that the promising results that have previously been reported on attacking NRMs, do not generalize to DR models.
We propose to formalize attacks on DR models as a contrastive learning problem in a multi-view representation space.
arXiv Detail & Related papers (2023-08-19T00:24:59Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Unbiased Mean Teacher for Cross-domain Object Detection [46.75177193771992]
Cross-domain object detection is challenging, because object detection model is often vulnerable to data variance.
We propose a new Unbiased Mean Teacher (UMT) model for cross-domain object detection.
Our UMT model achieves mAPs of 44.1%, 58.1%, 41.7%, and 43.1% on benchmark datasets.
arXiv Detail & Related papers (2020-03-02T08:20:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.