YOLOv12: Attention-Centric Real-Time Object Detectors
- URL: http://arxiv.org/abs/2502.12524v1
- Date: Tue, 18 Feb 2025 04:20:14 GMT
- Title: YOLOv12: Attention-Centric Real-Time Object Detectors
- Authors: Yunjie Tian, Qixiang Ye, David Doermann,
- Abstract summary: This paper proposes an attention-centric YOLO framework, YOLOv12, that matches the speed of previous CNN-based ones.<n>YOLOv12 surpasses all popular real-time object detectors in accuracy with competitive speed.
- Score: 38.507511985479006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Enhancing the network architecture of the YOLO framework has been crucial for a long time, but has focused on CNN-based improvements despite the proven superiority of attention mechanisms in modeling capabilities. This is because attention-based models cannot match the speed of CNN-based models. This paper proposes an attention-centric YOLO framework, namely YOLOv12, that matches the speed of previous CNN-based ones while harnessing the performance benefits of attention mechanisms. YOLOv12 surpasses all popular real-time object detectors in accuracy with competitive speed. For example, YOLOv12-N achieves 40.6% mAP with an inference latency of 1.64 ms on a T4 GPU, outperforming advanced YOLOv10-N / YOLOv11-N by 2.1%/1.2% mAP with a comparable speed. This advantage extends to other model scales. YOLOv12 also surpasses end-to-end real-time detectors that improve DETR, such as RT-DETR / RT-DETRv2: YOLOv12-S beats RT-DETR-R18 / RT-DETRv2-R18 while running 42% faster, using only 36% of the computation and 45% of the parameters. More comparisons are shown in Figure 1.
Related papers
- A Review of YOLOv12: Attention-Based Enhancements vs. Previous Versions [0.5639904484784127]
YOLOv12 introduces a novel approach that successfully incorporates attention-based enhancements while preserving real-time performance.
This paper provides a comprehensive review of YOLOv12's architectural innovations, including Area Attention for computationally efficient self-attention.
We benchmark YOLOv12 against prior YOLO versions and competing object detectors, analyzing its improvements in accuracy, inference speed, and computational efficiency.
arXiv Detail & Related papers (2025-04-16T11:40:55Z) - YOLOv12: A Breakdown of the Key Architectural Features [0.5639904484784127]
YOLOv12 is a significant advancement in single-stage, real-time object detection.
It incorporates an optimised backbone (R-ELAN), 7x7 separable convolutions, and FlashAttention-driven area-based attention.
It offers scalable solutions for both latency-sensitive and high-accuracy applications.
arXiv Detail & Related papers (2025-02-20T17:08:43Z) - What is YOLOv6? A Deep Insight into the Object Detection Model [0.0]
This work focuses on the YOLOv6 object detection model in depth.<n>YOLOv6-N achieves 37.5% AP at 1187 FPS on an NVIDIA Tesla T4 GPU.<n>YOLOv6-S reaches 45.0% AP at 484 FPS, outperforming models like PPYOLOE-S, YOLOv5-S, YOLOX-S, and YOLOv8-S in the same class.
arXiv Detail & Related papers (2024-12-17T15:26:15Z) - Comprehensive Performance Evaluation of YOLOv12, YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments [0.9565934024763958]
This study systematically performed a real-world evaluation of the performances of YOLOv8, YOLOv9, YOLOv10, YOLO11( or YOLOv11), and YOLOv12 object detection algorithms.
YOLOv12l recorded the highest recall rate at 0.90, compared to all other configurations of YOLO models.
YOLOv11n achieved highest inference speed of 2.4 ms, outperforming YOLOv8n (4.1 ms), YOLOv9 Gelan-s (11.5 ms), YOLOv10n (5.5 ms), and YOLOv12n
arXiv Detail & Related papers (2024-07-01T17:59:55Z) - YOLOv10: Real-Time End-to-End Object Detection [68.28699631793967]
YOLOs have emerged as the predominant paradigm in the field of real-time object detection.
The reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs.
We introduce the holistic efficiency-accuracy driven model design strategy for YOLOs.
arXiv Detail & Related papers (2024-05-23T11:44:29Z) - YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time
Object Detection [80.11152626362109]
We provide an efficient and performant object detector, termed YOLO-MS.
We train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets.
Our work can also be used as a plug-and-play module for other YOLO models.
arXiv Detail & Related papers (2023-08-10T10:12:27Z) - RCS-YOLO: A Fast and High-Accuracy Object Detector for Brain Tumor
Detection [7.798672884591179]
We propose a novel YOLO architecture based on channel Shuffle (RCS-YOLO)
Experimental results on the brain tumor dataset Br35H show that the proposed model surpasses YOLOv6, YOLOv7, and YOLOv8 in speed and accuracy.
Our proposed RCS-YOLO achieves state-of-the-art performance on the brain tumor detection task.
arXiv Detail & Related papers (2023-07-31T05:38:17Z) - Faster Attention Is What You Need: A Fast Self-Attention Neural Network
Backbone Architecture for the Edge via Double-Condensing Attention Condensers [71.40595908386477]
We introduce a new faster attention condenser design called double-condensing attention condensers.
The resulting backbone (which we name AttendNeXt) achieves significantly higher inference throughput on an embedded ARM processor.
These promising results demonstrate that exploring different efficient architecture designs and self-attention mechanisms can lead to interesting new building blocks for TinyML applications.
arXiv Detail & Related papers (2022-08-15T02:47:33Z) - A lightweight and accurate YOLO-like network for small target detection
in Aerial Imagery [94.78943497436492]
We present YOLO-S, a simple, fast and efficient network for small target detection.
YOLO-S exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation.
YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.
arXiv Detail & Related papers (2022-04-05T16:29:49Z) - Workshop on Autonomous Driving at CVPR 2021: Technical Report for
Streaming Perception Challenge [57.647371468876116]
We introduce our real-time 2D object detection system for the realistic autonomous driving scenario.
Our detector is built on a newly designed YOLO model, called YOLOX.
On the Argoverse-HD dataset, our system achieves 41.0 streaming AP, which surpassed second place by 7.8/6.1 on detection-only track/fully track, respectively.
arXiv Detail & Related papers (2021-07-27T06:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.