Related papers: TY-RIST: Tactical YOLO Tricks for Real-time Infrared Small Target Detection

TY-RIST: Tactical YOLO Tricks for Real-time Infrared Small Target Detection

URL: http://arxiv.org/abs/2509.22909v1
Date: Fri, 26 Sep 2025 20:36:57 GMT
Title: TY-RIST: Tactical YOLO Tricks for Real-time Infrared Small Target Detection
Authors: Abdulkarim Atrash, Omar Moured, Yufan Chen, Jiaming Zhang, Seyda Ertekin, Omur Ugur,
Abstract summary: Infrared small target detection (IRSTD) is critical for defense and surveillance but remains challenging.<n>We propose TY-RIST, an optimized YOLOv12n architecture that integrates a stride-aware backbone with fine-grained receptive fields.<n>Experiments on four benchmarks and across 20 different models demonstrate state-of-the-art performance, improving mAP at 0.5 IoU by +7.9%, Precision by +3%, and Recall by +10.2%.
Score: 6.0340092200636475
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Infrared small target detection (IRSTD) is critical for defense and surveillance but remains challenging due to (1) target loss from minimal features, (2) false alarms in cluttered environments, (3) missed detections from low saliency, and (4) high computational costs. To address these issues, we propose TY-RIST, an optimized YOLOv12n architecture that integrates (1) a stride-aware backbone with fine-grained receptive fields, (2) a high-resolution detection head, (3) cascaded coordinate attention blocks, and (4) a branch pruning strategy that reduces computational cost by about 25.5% while marginally improving accuracy and enabling real-time inference. We also incorporate the Normalized Gaussian Wasserstein Distance (NWD) to enhance regression stability. Extensive experiments on four benchmarks and across 20 different models demonstrate state-of-the-art performance, improving mAP at 0.5 IoU by +7.9%, Precision by +3%, and Recall by +10.2%, while achieving up to 123 FPS on a single GPU. Cross-dataset validation on a fifth dataset further confirms strong generalization capability. Additional results and resources are available at https://www.github.com/moured/TY-RIST

Related papers

A Text-Guided Vision Model for Enhanced Recognition of Small Instances [0.0]
An efficient text-guided object detection model has been developed to enhance the detection of small objects.<n>The proposed method replaces the C2f layer in the YOLOv8 backbone with a C3k2 layer, enabling more precise representation of local features.<n> Comparative experiments on the VisDrone dataset show that the proposed model outperforms the original YOLO-World model.
arXiv Detail & Related papers (2026-02-23T04:40:14Z)
N-EIoU-YOLOv9: A Signal-Aware Bounding Box Regression Loss for Lightweight Mobile Detection of Rice Leaf Diseases [0.6280530476948474]
We propose N EIoU YOLOv9, a lightweight detection framework based on a signal aware bounding box regression loss.<n>The proposed loss reshapes localization gradient by combining non monotonic focusing with decoupled width and height optimization.<n>This design is particularly effective for small and low contrast targets commonly observed in agricultural disease imagery.
arXiv Detail & Related papers (2026-01-14T05:13:36Z)
A Comprehensive Evaluation of YOLO-based Deer Detection Performance on Edge Devices [6.486957474966142]
The escalating economic losses in agriculture due to deer intrusion, estimated to be in the hundreds of millions of dollars annually in the U.S., highlight the inadequacy of traditional mitigation strategies.<n>There is a critical need for intelligent, autonomous solutions which require accurate and efficient deer detection.<n>This study presents a comprehensive evaluation of state-of-the-art deep learning models for deer detection in challenging real-world scenarios.
arXiv Detail & Related papers (2025-09-24T17:01:50Z)
ANROT-HELANet: Adverserially and Naturally Robust Attention-Based Aggregation Network via The Hellinger Distance for Few-Shot Classification [4.283774189998499]
We introduce ANROT-HELANet, an Adversarially and Naturally RObusT Hellinger Aggregation Network.<n>Our approach implements an adversarially and naturally robust Hellinger distance-based feature class aggregation scheme.<n>Our approach achieves superior image reconstruction quality with a FID score of 2.75, outperforming traditional VAE (3.43) and WAE (3.38) approaches.
arXiv Detail & Related papers (2025-09-14T11:44:43Z)
LEGNet: Lightweight Edge-Gaussian Driven Network for Low-Quality Remote Sensing Image Object Detection [18.804394986840887]
We introduce LEGNet, a lightweight backbone network featuring a novel Edge-Gaussian Aggregation (EGA) module.<n>EGA module integrates: (a) orientation-aware Scharr filters to sharpen crucial edge details often lost in low-contrast or blurred objects, and (b) Gaussian-prior-based feature refinement to suppress noise and regularize ambiguous feature responses.<n> Comprehensive evaluations across five benchmarks demonstrate that LEGNet achieves state-of-the-art performance, particularly in detecting low-quality objects.
arXiv Detail & Related papers (2025-03-18T08:20:24Z)
A Recurrent YOLOv8-based framework for Event-Based Object Detection [4.866548300593921]
This study introduces ReYOLOv8, an advanced object detection framework that enhances a frame-based detection system withtemporal modeling capabilities. We implement a low-latency, memory-efficient method for encoding event data to boost the system's performance. We also developed a novel data augmentation technique tailored to leverage the unique attributes of event data, thus improving detection accuracy.
arXiv Detail & Related papers (2024-08-09T20:00:16Z)
Sense Less, Generate More: Pre-training LiDAR Perception with Masked Autoencoders for Ultra-Efficient 3D Sensing [0.6340101348986665]
We propose a disruptively frugal LiDAR perception dataflow that generates rather than senses parts of the environment that are either predictable based on the extensive training of the environment or have limited consequence to the overall prediction accuracy. Our proposed generative pre-training strategy for this purpose, called as radially masked autoencoding (R-MAE), can also be readily implemented in a typical LiDAR system by selectively activating and controlling the laser power for randomly generated angular regions during on-field operations.
arXiv Detail & Related papers (2024-06-12T03:02:54Z)
SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised Learning for Robust Infrared Small Target Detection [53.19618419772467]
Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds. With the development of Transformer, the scale of SIRST models is constantly increasing. With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed.
arXiv Detail & Related papers (2024-03-08T16:14:54Z)
SpirDet: Towards Efficient, Accurate and Lightweight Infrared Small Target Detector [60.42293239557962]
We propose SpirDet, a novel approach for efficient detection of infrared small targets. We employ a new dual-branch sparse decoder to restore the feature map. Extensive experiments show that the proposed SpirDet significantly outperforms state-of-the-art models.
arXiv Detail & Related papers (2024-02-08T05:06:14Z)
Detecting Rotated Objects as Gaussian Distributions and Its 3-D Generalization [81.29406957201458]
Existing detection methods commonly use a parameterized bounding box (BBox) to model and detect (horizontal) objects. We argue that such a mechanism has fundamental limitations in building an effective regression loss for rotation detection. We propose to model the rotated objects as Gaussian distributions. We extend our approach from 2-D to 3-D with a tailored algorithm design to handle the heading estimation.
arXiv Detail & Related papers (2022-09-22T07:50:48Z)
A lightweight and accurate YOLO-like network for small target detection in Aerial Imagery [94.78943497436492]
We present YOLO-S, a simple, fast and efficient network for small target detection. YOLO-S exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation. YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.
arXiv Detail & Related papers (2022-04-05T16:29:49Z)
The KFIoU Loss for Rotated Object Detection [115.334070064346]
In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss. Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU. The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU.
arXiv Detail & Related papers (2022-01-29T10:54:57Z)
SADet: Learning An Efficient and Accurate Pedestrian Detector [68.66857832440897]
This paper proposes a series of systematic optimization strategies for the detection pipeline of one-stage detector. It forms a single shot anchor-based detector (SADet) for efficient and accurate pedestrian detection. Though structurally simple, it presents state-of-the-art result and real-time speed of $20$ FPS for VGA-resolution images.
arXiv Detail & Related papers (2020-07-26T12:32:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.