YOLOv1 to YOLOv11: A Comprehensive Survey of Real-Time Object Detection Innovations and Challenges
- URL: http://arxiv.org/abs/2508.02067v1
- Date: Mon, 04 Aug 2025 05:13:51 GMT
- Title: YOLOv1 to YOLOv11: A Comprehensive Survey of Real-Time Object Detection Innovations and Challenges
- Authors: Manikanta Kotthapalli, Deepika Ravipati, Reshma Bhatia,
- Abstract summary: YOLO (You Only Look Once) models transform the landscape of real-time vision applications through unified, end-to-end detection frameworks.<n>This paper offers a comprehensive review of the YOLO family, highlighting architectural innovations, performance benchmarks, extended capabilities, and real-world use cases.<n>We critically analyze the evolution of YOLO models and discuss emerging research directions that extend their impact across diverse computer vision domains.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Over the past decade, object detection has advanced significantly, with the YOLO (You Only Look Once) family of models transforming the landscape of real-time vision applications through unified, end-to-end detection frameworks. From YOLOv1's pioneering regression-based detection to the latest YOLOv9, each version has systematically enhanced the balance between speed, accuracy, and deployment efficiency through continuous architectural and algorithmic advancements.. Beyond core object detection, modern YOLO architectures have expanded to support tasks such as instance segmentation, pose estimation, object tracking, and domain-specific applications including medical imaging and industrial automation. This paper offers a comprehensive review of the YOLO family, highlighting architectural innovations, performance benchmarks, extended capabilities, and real-world use cases. We critically analyze the evolution of YOLO models and discuss emerging research directions that extend their impact across diverse computer vision domains.
Related papers
- A Decade of You Only Look Once (YOLO) for Object Detection: A Review [0.0]
Review marks the tenth anniversary of You Only Look Once (YOLO)<n>YOLO is one of the most influential frameworks in real-time object detection.
arXiv Detail & Related papers (2025-04-24T00:06:08Z) - YOLO-UniOW: Efficient Universal Open-World Object Detection [63.71512991320627]
We introduce Universal Open-World Object Detection (Uni-OWD), a new paradigm that unifies open-vocabulary and open-world object detection tasks.<n>YOLO-UniOW incorporates Adaptive Decision Learning to replace computationally expensive cross-modality fusion with lightweight alignment in the CLIP latent space.<n>Experiments validate the superiority of YOLO-UniOW, achieving 34.6 AP and 30.0 APr with an inference speed of 69.6 FPS.
arXiv Detail & Related papers (2024-12-30T01:34:14Z) - YOLO Evolution: A Comprehensive Benchmark and Architectural Review of YOLOv12, YOLO11, and Their Previous Versions [0.0]
This study represents the first comprehensive experimental evaluation of YOLOv3 to the latest version, YOLOv12.<n>The challenges considered include varying object sizes, diverse aspect ratios, and small-sized objects of a single class.<n>Our analysis highlights the distinctive strengths and limitations of each YOLO version.
arXiv Detail & Related papers (2024-10-31T20:45:00Z) - YOLOv11: An Overview of the Key Architectural Enhancements [0.5639904484784127]
The paper explores YOLOv11's expanded capabilities across various computer vision tasks, including object detection, instance segmentation, pose estimation, and oriented object detection (OBB)
We review the model's performance improvements in terms of mean Average Precision (mAP) and computational efficiency compared to its predecessors, with a focus on the trade-off between parameter count and accuracy.
Our research provides insights into YOLOv11's position within the broader landscape of object detection and its potential impact on real-time computer vision applications.
arXiv Detail & Related papers (2024-10-23T09:55:22Z) - YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-time Vision [0.6662800021628277]
This paper focuses on the evolution of the YOLO (You Only Look Once) object detection algorithm, focusing on YOLOv5, YOLOv8, and YOLOv10.
We analyze the architectural advancements, performance improvements, and suitability for edge deployment across these versions.
arXiv Detail & Related papers (2024-07-03T10:40:20Z) - YOLO advances to its genesis: a decadal and comprehensive review of the You Only Look Once (YOLO) series [6.751138557596013]
This review systematically examines the progression of the You Only Look Once (YOLO) object detection algorithms from YOLOv1 to YOLOv12.<n>The study highlights the transformative impact of YOLO models across five critical application areas: autonomous vehicles and traffic safety, healthcare and medical imaging, industrial manufacturing, surveillance and security, and agriculture.
arXiv Detail & Related papers (2024-06-12T06:41:23Z) - YOLOv10: Real-Time End-to-End Object Detection [68.28699631793967]
YOLOs have emerged as the predominant paradigm in the field of real-time object detection.
The reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs.
We introduce the holistic efficiency-accuracy driven model design strategy for YOLOs.
arXiv Detail & Related papers (2024-05-23T11:44:29Z) - YOLO-World: Real-Time Open-Vocabulary Object Detection [87.08732047660058]
We introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities.
Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency.
YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed.
arXiv Detail & Related papers (2024-01-30T18:59:38Z) - SATAY: A Streaming Architecture Toolflow for Accelerating YOLO Models on
FPGA Devices [48.47320494918925]
This work tackles the challenges of deploying stateof-the-art object detection models onto FPGA devices for ultralow latency applications.
We employ a streaming architecture design for our YOLO accelerators, implementing the complete model on-chip in a deeply pipelined fashion.
We introduce novel hardware components to support the operations of YOLO models in a dataflow manner, and off-chip memory buffering to address the limited on-chip memory resources.
arXiv Detail & Related papers (2023-09-04T13:15:01Z) - YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection [63.36722419180875]
We provide an efficient and performant object detector, termed YOLO-MS.<n>We train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets.<n>Our work can also serve as a plug-and-play module for other YOLO models.
arXiv Detail & Related papers (2023-08-10T10:12:27Z) - A Comprehensive Review of YOLO Architectures in Computer Vision: From
YOLOv1 to YOLOv8 and YOLO-NAS [0.0]
YOLO has become a central real-time object detection system for robotics, driverless cars, and video monitoring applications.
We present a comprehensive analysis of YOLO's evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO with Transformers.
arXiv Detail & Related papers (2023-04-02T10:27:34Z) - A lightweight and accurate YOLO-like network for small target detection
in Aerial Imagery [94.78943497436492]
We present YOLO-S, a simple, fast and efficient network for small target detection.
YOLO-S exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation.
YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.
arXiv Detail & Related papers (2022-04-05T16:29:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.