Related papers: SOD-YOLOv8 -- Enhancing YOLOv8 for Small Object Detection in Traffic Scenes

SOD-YOLOv8 -- Enhancing YOLOv8 for Small Object Detection in Traffic Scenes

URL: http://arxiv.org/abs/2408.04786v1
Date: Thu, 8 Aug 2024 23:05:25 GMT
Title: SOD-YOLOv8 -- Enhancing YOLOv8 for Small Object Detection in Traffic Scenes
Authors: Boshra Khalili, Andrew W. Smyth,
Abstract summary: Small Object Detection YOLOv8 (SOD-YOLOv8) is designed for scenarios involving numerous small objects. SOD-YOLOv8 significantly improves small object detection, surpassing widely used models in various metrics. In dynamic real-world traffic scenes, SOD-YOLOv8 demonstrated notable improvements in diverse conditions.
Score: 1.3812010983144802
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Object detection as part of computer vision can be crucial for traffic management, emergency response, autonomous vehicles, and smart cities. Despite significant advances in object detection, detecting small objects in images captured by distant cameras remains challenging due to their size, distance from the camera, varied shapes, and cluttered backgrounds. To address these challenges, we propose Small Object Detection YOLOv8 (SOD-YOLOv8), a novel model specifically designed for scenarios involving numerous small objects. Inspired by Efficient Generalized Feature Pyramid Networks (GFPN), we enhance multi-path fusion within YOLOv8 to integrate features across different levels, preserving details from shallower layers and improving small object detection accuracy. Also, A fourth detection layer is added to leverage high-resolution spatial information effectively. The Efficient Multi-Scale Attention Module (EMA) in the C2f-EMA module enhances feature extraction by redistributing weights and prioritizing relevant features. We introduce Powerful-IoU (PIoU) as a replacement for CIoU, focusing on moderate-quality anchor boxes and adding a penalty based on differences between predicted and ground truth bounding box corners. This approach simplifies calculations, speeds up convergence, and enhances detection accuracy. SOD-YOLOv8 significantly improves small object detection, surpassing widely used models in various metrics, without substantially increasing computational cost or latency compared to YOLOv8s. Specifically, it increases recall from 40.1\% to 43.9\%, precision from 51.2\% to 53.9\%, $\text{mAP}_{0.5}$ from 40.6\% to 45.1\%, and $\text{mAP}_{0.5:0.95}$ from 24\% to 26.6\%. In dynamic real-world traffic scenes, SOD-YOLOv8 demonstrated notable improvements in diverse conditions, proving its reliability and effectiveness in detecting small objects even in challenging environments.

Related papers

MASF-YOLO: An Improved YOLOv11 Network for Small Object Detection on Drone View [0.0]
We propose a novel object detection network Multi-scale Context Aggregation and Scale-adaptive Fusion YOLO (MASF-YOLO) To tackle the difficulty of detecting small objects in UAV images, we design a Multi-scale Feature Aggregation Module (MFAM), which significantly improves the detection accuracy of small objects. Thirdly, we introduce a Dimension-Aware Selective Integration Module (DASI), which further enhances multi-scale feature fusion capabilities.
arXiv Detail & Related papers (2025-04-25T07:43:33Z)
A lightweight model FDM-YOLO for small target improvement based on YOLOv8 [0.0]
Small targets are difficult to detect due to their low pixel count, complex backgrounds, and varying shooting angles. This paper focuses on small target detection and explores methods for object detection under low computational constraints.
arXiv Detail & Related papers (2025-03-06T14:06:35Z)
DASSF: Dynamic-Attention Scale-Sequence Fusion for Aerial Object Detection [6.635903943457569]
The original YOLO algorithm has low overall detection accuracy due to its weak ability to perceive targets of different scales. This paper proposes a dynamic-attention scale-sequence fusion algorithm (DASSF) for small target detection in aerial images. Experimental results show that when the DASSF method is applied to YOLOv8, compared to YOLOv8n, the model shows an increase of 9.2% and 2.4% in the mean average precision (mAP)
arXiv Detail & Related papers (2024-06-18T05:26:44Z)
Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head. The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement. This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z)
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection [80.11152626362109]
We provide an efficient and performant object detector, termed YOLO-MS. We train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets. Our work can also be used as a plug-and-play module for other YOLO models.
arXiv Detail & Related papers (2023-08-10T10:12:27Z)
Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection [20.886999159134138]
Real-time efficient perception is critical for autonomous navigation and city scale sensing. In this work, we propose a learnable geometry-guided prior that incorporates rough geometry of the 3D scene. Our approach improves detection rate by +4.1 $AP_S$ or +39% and in real-time performance by +5.3 $sAP_S$ or +63% for small objects over state-of-the-art (SOTA)
arXiv Detail & Related papers (2023-03-25T00:43:44Z)
An advanced YOLOv3 method for small object detection [2.906551456030129]
This paper introduces an improved YOLOv3 algorithm for small object detection. In the proposed method, the dilated convolutions mish (DCM) module is introduced into the backbone network of YOLOv3. In the neck network of YOLOv3, the convolutional block attention module (CBAM) and multi-level fusion module are introduced.
arXiv Detail & Related papers (2022-12-06T07:58:21Z)
SALISA: Saliency-based Input Sampling for Efficient Video Object Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection. We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z)
A lightweight and accurate YOLO-like network for small target detection in Aerial Imagery [94.78943497436492]
We present YOLO-S, a simple, fast and efficient network for small target detection. YOLO-S exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation. YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.
arXiv Detail & Related papers (2022-04-05T16:29:49Z)
Analysis and Adaptation of YOLOv4 for Object Detection in Aerial Images [0.0]
Our work shows the adaptation of the popular YOLOv4 framework for predicting the objects and their locations in aerial images. The trained model resulted in a mean average precision (mAP) of 45.64% with an inference speed reaching 8.7 FPS on the Tesla K80 GPU. A comparative study with several contemporary aerial object detectors proved that YOLOv4 performed better, implying a more suitable detection algorithm to incorporate on aerial platforms.
arXiv Detail & Related papers (2022-03-18T23:51:09Z)
AdaZoom: Adaptive Zoom Network for Multi-Scale Object Detection in Large Scenes [57.969186815591186]
Detection in large-scale scenes is a challenging problem due to small objects and extreme scale variation. We propose a novel Adaptive Zoom (AdaZoom) network as a selective magnifier with flexible shape and focal length to adaptively zoom the focus regions for object detection.
arXiv Detail & Related papers (2021-06-19T03:30:22Z)
Real-time object detection method based on improved YOLOv4-tiny [0.0]
YOLOv4-tiny is proposed based on YOLOv4 to simple the network structure and reduce parameters, which makes it be suitable for developing on the mobile and embedded devices. It firstly uses two ResBlock-D modules in ResNet-D network instead of two CSPBlock modules in Yolov4-tiny, which reduces the computation complexity. In the design of auxiliary network, two consecutive 3x3 convolutions are used to obtain 5x5 receptive fields to extract global features, and channel attention and spatial attention are also used to extract more effective information.
arXiv Detail & Related papers (2020-11-09T08:26:28Z)
Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture. We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions. Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z)
NETNet: Neighbor Erasing and Transferring Network for Better Single Shot Object Detection [170.30694322460045]
We propose a new Neighbor Erasing and Transferring (NET) mechanism to reconfigure the pyramid features and explore scale-aware features. A single-shot network called NETNet is constructed for scale-aware object detection.
arXiv Detail & Related papers (2020-01-18T15:21:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.