Open-Vocabulary Object Detection in UAV Imagery: A Review and Future Perspectives
- URL: http://arxiv.org/abs/2507.13359v1
- Date: Fri, 04 Jul 2025 04:56:25 GMT
- Title: Open-Vocabulary Object Detection in UAV Imagery: A Review and Future Perspectives
- Authors: Yang Zhou, Junjie Li, CongYang Ou, Dawei Yan, Haokui Zhang, Xizhe Xue,
- Abstract summary: In recent years, advancements in Unmanned Aerial Vehicles (UAV) technology have propelled this field to new heights.<n>Traditional UAV aerial object detection methods primarily focus on detecting predefined categories.<n>The advent of cross-modal text-image alignment (e.g., CLIP) has overcome this limitation, enabling open-vocabulary object detection (OVOD)<n>This paper presents a comprehensive survey of OVOD in the context of UAV aerial scenes.
- Score: 17.28550362736493
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Due to its extensive applications, aerial image object detection has long been a hot topic in computer vision. In recent years, advancements in Unmanned Aerial Vehicles (UAV) technology have further propelled this field to new heights, giving rise to a broader range of application requirements. However, traditional UAV aerial object detection methods primarily focus on detecting predefined categories, which significantly limits their applicability. The advent of cross-modal text-image alignment (e.g., CLIP) has overcome this limitation, enabling open-vocabulary object detection (OVOD), which can identify previously unseen objects through natural language descriptions. This breakthrough significantly enhances the intelligence and autonomy of UAVs in aerial scene understanding. This paper presents a comprehensive survey of OVOD in the context of UAV aerial scenes. We begin by aligning the core principles of OVOD with the unique characteristics of UAV vision, setting the stage for a specialized discussion. Building on this foundation, we construct a systematic taxonomy that categorizes existing OVOD methods for aerial imagery and provides a comprehensive overview of the relevant datasets. This structured review enables us to critically dissect the key challenges and open problems at the intersection of these fields. Finally, based on this analysis, we outline promising future research directions and application prospects. This survey aims to provide a clear road map and a valuable reference for both newcomers and seasoned researchers, fostering innovation in this rapidly evolving domain. We keep tracing related works at https://github.com/zhouyang2002/OVOD-in-UVA-imagery
Related papers
- More Clear, More Flexible, More Precise: A Comprehensive Oriented Object Detection benchmark for UAV [58.89234732689013]
CODrone is a comprehensive oriented object detection dataset for UAVs that accurately reflects real-world conditions.<n>It also serves as a new benchmark designed to align with downstream task requirements.<n>We conduct a series of experiments based on 22 classical or SOTA methods to rigorously evaluate CODrone.
arXiv Detail & Related papers (2025-04-28T17:56:02Z) - Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation [58.37525311718006]
We put forth a novel formulation of the aerial object detection problem, namely open-vocabulary aerial object detection (OVAD)
We propose CastDet, a CLIP-activated student-teacher detection framework that serves as the first OVAD detector specifically designed for the challenging aerial scenario.
Our framework integrates a robust localization teacher along with several box selection strategies to generate high-quality proposals for novel objects.
arXiv Detail & Related papers (2024-11-04T12:59:13Z) - Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community [58.417475846791234]
We propose and train the novel LAE-DINO Model, the first open-vocabulary foundation object detector for the LAE task.<n>We conduct experiments on established remote sensing benchmark DIOR, DOTAv2.0, as well as our newly introduced 80-class LAE-80C benchmark.<n>Results demonstrate the advantages of the LAE-1M dataset and the effectiveness of the LAE-DINO method.
arXiv Detail & Related papers (2024-08-17T06:24:43Z) - Dehazing Remote Sensing and UAV Imagery: A Review of Deep Learning, Prior-based, and Hybrid Approaches [4.516330345599765]
High-quality images are crucial in remote sensing and UAV applications.
atmospheric haze can severely degrade image quality, making image dehazing a critical research area.
arXiv Detail & Related papers (2024-05-13T07:35:24Z) - Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve
Aerial Visual Perception? [57.77643186237265]
We present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives.
MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes.
This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets.
arXiv Detail & Related papers (2023-12-07T18:59:14Z) - Investigation of UAV Detection in Images with Complex Backgrounds and
Rainy Artifacts [20.20609511526255]
Vision-based object detection methods have been developed for UAV detection.
UAV detection in images with complex backgrounds and weather artifacts like rain has yet to be reasonably studied.
This work also focuses on benchmarking state-of-the-art object detection models.
arXiv Detail & Related papers (2023-05-25T19:54:33Z) - The State of Aerial Surveillance: A Survey [62.198765910573556]
This paper provides a comprehensive overview of human-centric aerial surveillance tasks from a computer vision and pattern recognition perspective.
The main object of interest is humans, where single or multiple subjects are to be detected, identified, tracked, re-identified and have their behavior analyzed.
arXiv Detail & Related papers (2022-01-09T20:13:27Z) - A Review on Deep Learning in UAV Remote Sensing [7.721988450630861]
We present a comprehensive review of the fundamentals of Deep Learning (DL) applied in UAV-based imagery.
For that, a total of 232 papers published in international scientific journal databases was examined.
We relate how DL presents promising results and has the potential for processing tasks associated with UAV-based image data.
arXiv Detail & Related papers (2021-01-22T16:08:38Z) - Perceiving Traffic from Aerial Images [86.994032967469]
We propose an object detection method called Butterfly Detector that is tailored to detect objects in aerial images.
We evaluate our Butterfly Detector on two publicly available UAV datasets (UAVDT and VisDrone 2019) and show that it outperforms previous state-of-the-art methods while remaining real-time.
arXiv Detail & Related papers (2020-09-16T11:37:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.