YOLO-World: Real-Time Open-Vocabulary Object Detection
- URL: http://arxiv.org/abs/2401.17270v3
- Date: Thu, 22 Feb 2024 13:05:52 GMT
- Title: YOLO-World: Real-Time Open-Vocabulary Object Detection
- Authors: Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying
Shan
- Abstract summary: We introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities.
Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency.
YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed.
- Score: 87.08732047660058
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The You Only Look Once (YOLO) series of detectors have established themselves
as efficient and practical tools. However, their reliance on predefined and
trained object categories limits their applicability in open scenarios.
Addressing this limitation, we introduce YOLO-World, an innovative approach
that enhances YOLO with open-vocabulary detection capabilities through
vision-language modeling and pre-training on large-scale datasets.
Specifically, we propose a new Re-parameterizable Vision-Language Path
Aggregation Network (RepVL-PAN) and region-text contrastive loss to facilitate
the interaction between visual and linguistic information. Our method excels in
detecting a wide range of objects in a zero-shot manner with high efficiency.
On the challenging LVIS dataset, YOLO-World achieves 35.4 AP with 52.0 FPS on
V100, which outperforms many state-of-the-art methods in terms of both accuracy
and speed. Furthermore, the fine-tuned YOLO-World achieves remarkable
performance on several downstream tasks, including object detection and
open-vocabulary instance segmentation.
Related papers
- YOLO-UniOW: Efficient Universal Open-World Object Detection [63.71512991320627]
We introduce Universal Open-World Object Detection (Uni-OWD), a new paradigm that unifies open-vocabulary and open-world object detection tasks.
YOLO-UniOW incorporates Adaptive Decision Learning to replace computationally expensive cross-modality fusion with lightweight alignment in the CLIP latent space.
Experiments validate the superiority of YOLO-UniOW, achieving 34.6 AP and 30.0 APr with an inference speed of 69.6 FPS.
arXiv Detail & Related papers (2024-12-30T01:34:14Z) - Oriented Tiny Object Detection: A Dataset, Benchmark, and Dynamic Unbiased Learning [51.170479006249195]
We introduce a new dataset, benchmark, and a dynamic coarse-to-fine learning scheme in this study.
Our proposed dataset, AI-TOD-R, features the smallest object sizes among all oriented object detection datasets.
We present a benchmark spanning a broad range of detection paradigms, including both fully-supervised and label-efficient approaches.
arXiv Detail & Related papers (2024-12-16T09:14:32Z) - Evaluating the Evolution of YOLO (You Only Look Once) Models: A Comprehensive Benchmark Study of YOLO11 and Its Predecessors [0.0]
This study presents a benchmark analysis of various YOLO (You Only Look Once) algorithms, from YOLOv3 to the newest addition, YOLO11.
It evaluates their performance on three diverse datasets: Traffic Signs (with varying object sizes), African Wildlife (with diverse aspect ratios and at least one instance of the object per image), and Ships and Vessels (with small-sized objects of a single class)
arXiv Detail & Related papers (2024-10-31T20:45:00Z) - YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-time Vision [0.6662800021628277]
This paper focuses on the evolution of the YOLO (You Only Look Once) object detection algorithm, focusing on YOLOv5, YOLOv8, and YOLOv10.
We analyze the architectural advancements, performance improvements, and suitability for edge deployment across these versions.
arXiv Detail & Related papers (2024-07-03T10:40:20Z) - YOLOv10: Real-Time End-to-End Object Detection [68.28699631793967]
YOLOs have emerged as the predominant paradigm in the field of real-time object detection.
The reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs.
We introduce the holistic efficiency-accuracy driven model design strategy for YOLOs.
arXiv Detail & Related papers (2024-05-23T11:44:29Z) - YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection [63.36722419180875]
We provide an efficient and performant object detector, termed YOLO-MS.
We train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets.
Our work can also serve as a plug-and-play module for other YOLO models.
arXiv Detail & Related papers (2023-08-10T10:12:27Z) - A lightweight and accurate YOLO-like network for small target detection
in Aerial Imagery [94.78943497436492]
We present YOLO-S, a simple, fast and efficient network for small target detection.
YOLO-S exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation.
YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.
arXiv Detail & Related papers (2022-04-05T16:29:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.