Related papers: Enhancing Small Object Detection with YOLO: A Novel Framework for Improved Accuracy and Efficiency

Enhancing Small Object Detection with YOLO: A Novel Framework for Improved Accuracy and Efficiency

URL: http://arxiv.org/abs/2512.07379v1
Date: Mon, 08 Dec 2025 10:15:21 GMT
Title: Enhancing Small Object Detection with YOLO: A Novel Framework for Improved Accuracy and Efficiency
Authors: Mahila Moghadami, Mohammad Ali Keyvanrad, Melika Sabaghian,
Abstract summary: This paper investigates and develops methods for detecting small objects in large-scale aerial images.<n>We adopt the base SW-YOLO approach to enhance speed and accuracy in small object detection by refining cropping dimensions and overlap in sliding window usage.<n>We propose a novel model by modifying the base model architecture, including advanced feature extraction modules in the neck for feature map enhancement.<n>We compare our method with SAHI, one of the most powerful frameworks for processing large-scale images, and CZDet, which is also based on image cropping, achieving significant improvements in accuracy.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper investigates and develops methods for detecting small objects in large-scale aerial images. Current approaches for detecting small objects in aerial images often involve image cropping and modifications to detector network architectures. Techniques such as sliding window cropping and architectural enhancements, including higher-resolution feature maps and attention mechanisms, are commonly employed. Given the growing importance of aerial imagery in various critical and industrial applications, the need for robust frameworks for small object detection becomes imperative. To address this need, we adopted the base SW-YOLO approach to enhance speed and accuracy in small object detection by refining cropping dimensions and overlap in sliding window usage and subsequently enhanced it through architectural modifications. we propose a novel model by modifying the base model architecture, including advanced feature extraction modules in the neck for feature map enhancement, integrating CBAM in the backbone to preserve spatial and channel information, and introducing a new head to boost small object detection accuracy. Finally, we compared our method with SAHI, one of the most powerful frameworks for processing large-scale images, and CZDet, which is also based on image cropping, achieving significant improvements in accuracy. The proposed model achieves significant accuracy gains on the VisDrone2019 dataset, outperforming baseline YOLOv5L detection by a substantial margin. Specifically, the final proposed model elevates the mAP .5.5 accuracy on the VisDrone2019 dataset from the base accuracy of 35.5 achieved by the YOLOv5L detector to 61.2. Notably, the accuracy of CZDet, which is another classic method applied to this dataset, is 58.36. This research demonstrates a significant improvement, achieving an increase in accuracy from 35.5 to 61.2.

Related papers

RT-DETRv4: Painlessly Furthering Real-Time Object Detection with Vision Foundation Models [48.91205564876609]
We propose a cost-effective and highly adaptable distillation framework to enhance lightweight object detectors.<n>Our approach painlessly delivers striking and consistent performance gains across diverse DETR-based models.<n>Our new model family, RT-DETRv4, achieves state-of-the-art results on COCO, attaining AP scores of 49.7/53.5/55.4/57.0 at corresponding speeds of 273/169/124/78 FPS.
arXiv Detail & Related papers (2025-10-29T08:13:17Z)
Efficient Oriented Object Detection with Enhanced Small Object Recognition in Aerial Images [2.9138705529771123]
We present a novel enhancement to the YOLOv8 model, tailored for oriented object detection tasks.<n>Our model features a wavelet transform-based C2f module for capturing associative features and an Adaptive Scale Feature Pyramid (ASFP) module that leverages P2 layer details.<n>Our approach provides a more efficient architectural design than DecoupleNet, which has 23.3M parameters, all while maintaining detection accuracy.
arXiv Detail & Related papers (2024-12-17T05:45:48Z)
SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients [0.8873228457453465]
Small object detection in aerial imagery presents significant challenges in computer vision. Traditional methods using transformer-based models often face limitations stemming from the lack of specialized databases. This paper introduces two innovative approaches that significantly enhance detection and segmentation capabilities for small aerial objects.
arXiv Detail & Related papers (2024-05-02T19:47:08Z)
From Blurry to Brilliant Detection: YOLO-Based Aerial Object Detection with Super Resolution [3.5044007821404635]
Aerial object detection presents challenges from small object sizes, high density clustering, and image quality degradation from distance and motion blur.<n>B2BDet addresses this with a two-stage framework that applies domain-specific super-resolution during inference, followed by detection using an enhanced YOLOv5 architecture.<n>The approach combines aerial-optimized SRGAN fine-tuning with architectural innovations including an Efficient Attention Module (EAM) and Cross-Layer Feature Pyramid Network (CLFPN)
arXiv Detail & Related papers (2024-01-26T05:50:58Z)
Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head. The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement. This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z)
HIC-YOLOv5: Improved YOLOv5 For Small Object Detection [2.4780916008623834]
An improved YOLOv5 model: HIC-YOLOv5 is proposed to address the aforementioned problems. An involution block is adopted between the backbone and neck to increase channel information of the feature map. Our result shows that HIC-YOLOv5 has improved mAP@[.5:.95] by 6.42% and mAP@0.5 by 9.38% on VisDrone 2019-DET dataset.
arXiv Detail & Related papers (2023-09-28T12:40:36Z)
3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection. We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution. It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z)
Knowledge Distillation for Oriented Object Detection on Aerial Images [1.827510863075184]
We present a model compression method for rotated object detection on aerial images by knowledge distillation, namely KD-RNet. The experimental result on a large-scale aerial object detection dataset (DOTA) demonstrates that the proposed KD-RNet model can achieve improved mean-average precision (mAP) with reduced number of parameters, at the same time, KD-RNet boost the performance on providing high quality detections with higher overlap with groundtruth annotations.
arXiv Detail & Related papers (2022-06-20T14:24:16Z)
AdaZoom: Adaptive Zoom Network for Multi-Scale Object Detection in Large Scenes [57.969186815591186]
Detection in large-scale scenes is a challenging problem due to small objects and extreme scale variation. We propose a novel Adaptive Zoom (AdaZoom) network as a selective magnifier with flexible shape and focal length to adaptively zoom the focus regions for object detection.
arXiv Detail & Related papers (2021-06-19T03:30:22Z)
Spatial Attention Improves Iterative 6D Object Pose Estimation [52.365075652976735]
We propose a new method for 6D pose estimation refinement from RGB images. Our main insight is that after the initial pose estimate, it is important to pay attention to distinct spatial features of the object. We experimentally show that this approach learns to attend to salient spatial features and learns to ignore occluded parts of the object, leading to better pose estimation across datasets.
arXiv Detail & Related papers (2021-01-05T17:18:52Z)
Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image. Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space. We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step. This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
Improving 3D Object Detection through Progressive Population Based Augmentation [91.56261177665762]
We present the first attempt to automate the design of data augmentation policies for 3D object detection. We introduce the Progressive Population Based Augmentation (PPBA) algorithm, which learns to optimize augmentation strategies by narrowing down the search space and adopting the best parameters discovered in previous iterations. We find that PPBA may be up to 10x more data efficient than baseline 3D detection models without augmentation, highlighting that 3D detection models may achieve competitive accuracy with far fewer labeled examples.
arXiv Detail & Related papers (2020-04-02T05:57:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.