Related papers: A Review of YOLOv12: Attention-Based Enhancements vs. Previous Versions

A Review of YOLOv12: Attention-Based Enhancements vs. Previous Versions

URL: http://arxiv.org/abs/2504.11995v1
Date: Wed, 16 Apr 2025 11:40:55 GMT
Title: A Review of YOLOv12: Attention-Based Enhancements vs. Previous Versions
Authors: Rahima Khanam, Muhammad Hussain,
Abstract summary: YOLOv12 introduces a novel approach that successfully incorporates attention-based enhancements while preserving real-time performance.<n>This paper provides a comprehensive review of YOLOv12's architectural innovations, including Area Attention for computationally efficient self-attention.<n>We benchmark YOLOv12 against prior YOLO versions and competing object detectors, analyzing its improvements in accuracy, inference speed, and computational efficiency.
Score: 0.5639904484784127
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The YOLO (You Only Look Once) series has been a leading framework in real-time object detection, consistently improving the balance between speed and accuracy. However, integrating attention mechanisms into YOLO has been challenging due to their high computational overhead. YOLOv12 introduces a novel approach that successfully incorporates attention-based enhancements while preserving real-time performance. This paper provides a comprehensive review of YOLOv12's architectural innovations, including Area Attention for computationally efficient self-attention, Residual Efficient Layer Aggregation Networks for improved feature aggregation, and FlashAttention for optimized memory access. Additionally, we benchmark YOLOv12 against prior YOLO versions and competing object detectors, analyzing its improvements in accuracy, inference speed, and computational efficiency. Through this analysis, we demonstrate how YOLOv12 advances real-time object detection by refining the latency-accuracy trade-off and optimizing computational resources.

Related papers

YOLO-DS: Fine-Grained Feature Decoupling via Dual-Statistic Synergy Operator for Object Detection [55.58092342624062]
We propose YOLO-DS, a framework built around a novel Dual-Statistic Synergy Operator (DSO)<n>YOLO-DS decouples object features by jointly modeling the channel-wise mean and the peak-to-mean difference.<n>On the MS-COCO benchmark, YOLO-DS consistently outperforms YOLOv8 across five model scales.
arXiv Detail & Related papers (2026-01-26T05:50:32Z)
YOLO-IOD: Towards Real Time Incremental Object Detection [57.862742461237055]
We introduce YOLO-IOD, a real-time Incremental Object Detection (IOD) framework that is constructed upon the pretrained YOLO-World model.<n>YOLO-IOD encompasses three principal components: 1) Conflict-Aware Pseudo-Label Refinement (CPR), which mitigates the foreground-background confusion.<n>We also introduce Cross-Stage Asymmetric Knowledge Distillation (CAKD), which addresses the misaligned knowledge distillation conflict.
arXiv Detail & Related papers (2025-12-28T15:35:26Z)
Ultralytics YOLO Evolution: An Overview of YOLO26, YOLO11, YOLOv8 and YOLOv5 Object Detectors for Computer Vision and Pattern Recognition [3.2882817259131403]
This paper presents a comprehensive overview of the Ultralytics YOLO(You Only Look Once) family of object detectors.<n>The review begins with the most recent release, YOLO26 (or YOLOv26), which introduces key innovations including Distribution Focal Loss (DFL) removal.<n>The paper identifies challenges and future directions, including dense-scene limitations, hybrid CNN-Transformer integration, open-vocabulary detection, and edge-aware training approaches.
arXiv Detail & Related papers (2025-10-06T23:28:44Z)
YOLOv1 to YOLOv11: A Comprehensive Survey of Real-Time Object Detection Innovations and Challenges [0.0]
YOLO (You Only Look Once) models transform the landscape of real-time vision applications through unified, end-to-end detection frameworks.<n>This paper offers a comprehensive review of the YOLO family, highlighting architectural innovations, performance benchmarks, extended capabilities, and real-world use cases.<n>We critically analyze the evolution of YOLO models and discuss emerging research directions that extend their impact across diverse computer vision domains.
arXiv Detail & Related papers (2025-08-04T05:13:51Z)
YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception [44.76134548023668]
We propose YOLOv13, an accurate and lightweight object detector.<n>We propose a Hypergraph-based Adaptive Correlation Enhancement (HyperACE) mechanism.<n>We also propose a Full-Pipeline Aggregation-and-Distribution (FullPAD) paradigm.
arXiv Detail & Related papers (2025-06-21T15:15:03Z)
YOLOE: Real-Time Seeing Anything [64.35836518093342]
YOLOE integrates detection and segmentation across diverse open prompt mechanisms within a single highly efficient model. YOLOE's exceptional zero-shot performance and transferability with high inference efficiency and low training cost.
arXiv Detail & Related papers (2025-03-10T15:42:59Z)
YOLOv12: A Breakdown of the Key Architectural Features [0.5639904484784127]
YOLOv12 is a significant advancement in single-stage, real-time object detection.<n>It incorporates an optimised backbone (R-ELAN), 7x7 separable convolutions, and FlashAttention-driven area-based attention.<n>It offers scalable solutions for both latency-sensitive and high-accuracy applications.
arXiv Detail & Related papers (2025-02-20T17:08:43Z)
YOLO Evolution: A Comprehensive Benchmark and Architectural Review of YOLOv12, YOLO11, and Their Previous Versions [0.0]
This study represents the first comprehensive experimental evaluation of YOLOv3 to the latest version, YOLOv12.<n>The challenges considered include varying object sizes, diverse aspect ratios, and small-sized objects of a single class.<n>Our analysis highlights the distinctive strengths and limitations of each YOLO version.
arXiv Detail & Related papers (2024-10-31T20:45:00Z)
YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-time Vision [0.6662800021628277]
This paper focuses on the evolution of the YOLO (You Only Look Once) object detection algorithm, focusing on YOLOv5, YOLOv8, and YOLOv10. We analyze the architectural advancements, performance improvements, and suitability for edge deployment across these versions.
arXiv Detail & Related papers (2024-07-03T10:40:20Z)
YOLO9tr: A Lightweight Model for Pavement Damage Detection Utilizing a Generalized Efficient Layer Aggregation Network and Attention Mechanism [0.0]
This paper proposes YOLO9tr, a novel lightweight object detection model for pavement damage detection. YOLO9tr is based on the YOLOv9 architecture, incorporating a partial attention block that enhances feature extraction and attention mechanisms. The model achieves a high frame rate of up to 136 FPS, making it suitable for real-time applications such as video surveillance and automated inspection systems.
arXiv Detail & Related papers (2024-06-17T06:31:43Z)
YOLOv10: Real-Time End-to-End Object Detection [68.28699631793967]
YOLOs have emerged as the predominant paradigm in the field of real-time object detection. The reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs. We introduce the holistic efficiency-accuracy driven model design strategy for YOLOs.
arXiv Detail & Related papers (2024-05-23T11:44:29Z)
YOLO-World: Real-Time Open-Vocabulary Object Detection [87.08732047660058]
We introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities. Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency. YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed.
arXiv Detail & Related papers (2024-01-30T18:59:38Z)
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection [63.36722419180875]
We provide an efficient and performant object detector, termed YOLO-MS. We train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets. Our work can also serve as a plug-and-play module for other YOLO models.
arXiv Detail & Related papers (2023-08-10T10:12:27Z)
A lightweight and accurate YOLO-like network for small target detection in Aerial Imagery [94.78943497436492]
We present YOLO-S, a simple, fast and efficient network for small target detection. YOLO-S exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation. YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.
arXiv Detail & Related papers (2022-04-05T16:29:49Z)
Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking) We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.