UAV-DETR: Efficient End-to-End Object Detection for Unmanned Aerial Vehicle Imagery
- URL: http://arxiv.org/abs/2501.01855v1
- Date: Fri, 03 Jan 2025 15:11:14 GMT
- Title: UAV-DETR: Efficient End-to-End Object Detection for Unmanned Aerial Vehicle Imagery
- Authors: Huaxiang Zhang, Kai Liu, Zhongxue Gan, Guo-Niu Zhu,
- Abstract summary: Unmanned aerial vehicle object detection (UAV-OD) has been widely used in various scenarios.
Most existing UAV-OD algorithms rely on manually designed components, which require extensive tuning.
This paper proposes an efficient detection transformer (DETR) framework tailored for UAV imagery.
- Score: 14.599037804047724
- License:
- Abstract: Unmanned aerial vehicle object detection (UAV-OD) has been widely used in various scenarios. However, most existing UAV-OD algorithms rely on manually designed components, which require extensive tuning. End-to-end models that do not depend on such manually designed components are mainly designed for natural images, which are less effective for UAV imagery. To address such challenges, this paper proposes an efficient detection transformer (DETR) framework tailored for UAV imagery, i.e., UAV-DETR. The framework includes a multi-scale feature fusion with frequency enhancement module, which captures both spatial and frequency information at different scales. In addition, a frequency-focused down-sampling module is presented to retain critical spatial details during down-sampling. A semantic alignment and calibration module is developed to align and fuse features from different fusion paths. Experimental results demonstrate the effectiveness and generalization of our approach across various UAV imagery datasets. On the VisDrone dataset, our method improves AP by 3.1\% and $\text{AP}_{50}$ by 4.2\% over the baseline. Similar enhancements are observed on the UAVVaste dataset. The project page: https://github.com/ValiantDiligent/UAV-DETR
Related papers
- UAVDB: Trajectory-Guided Adaptable Bounding Boxes for UAV Detection [0.03464344220266879]
This paper introduces UAVDB, a high-resolution UAV detection dataset constructed using Patch Intensity Convergence (PIC)
We first validate the accuracy and efficiency of PIC-generated bounding boxes by comparing Intersection over Union (IoU) performance and runtime.
We then benchmark UAVDB using state-of-the-art (SOTA) YOLO-series detectors, establishing UAVDB as a valuable resource for advancing long-range and high-resolution UAV detection.
arXiv Detail & Related papers (2024-09-09T13:27:53Z) - SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection [59.868772767818975]
We propose a simple yet effective Semi-supervised Oriented Object Detection method termed SOOD++.
Specifically, we observe that objects from aerial images are usually arbitrary orientations, small scales, and aggregation.
Extensive experiments conducted on various multi-oriented object datasets under various labeled settings demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-07-01T07:03:51Z) - Boost UAV-based Ojbect Detection via Scale-Invariant Feature Disentanglement and Adversarial Learning [18.11107031800982]
We propose to improve single-stage inference accuracy through learning scale-invariant features.
Our approach can effectively improve model accuracy and achieve state-of-the-art (SoTA) performance on two datasets.
arXiv Detail & Related papers (2024-05-24T11:40:22Z) - Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z) - Rotation Invariant Transformer for Recognizing Object in UAVs [66.1564328237299]
We propose a novel rotation invariant vision transformer (RotTrans) forRecognizing targets of interest from UAVs.
RotTrans greatly outperforms the current state-of-the-arts, which is 5.9% and 4.8% higher than the highest mAP and Rank1.
Our solution wins the first place in the UAV-based person re-recognition track in the Multi-Modal Video Reasoning and Analyzing Competition.
arXiv Detail & Related papers (2023-11-05T03:55:08Z) - Adaptive Rotated Convolution for Rotated Object Detection [96.94590550217718]
We present Adaptive Rotated Convolution (ARC) module to handle rotated object detection problem.
In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images.
The proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP.
arXiv Detail & Related papers (2023-03-14T11:53:12Z) - Archangel: A Hybrid UAV-based Human Detection Benchmark with Position
and Pose Metadata [10.426019628829204]
Archangel is the first UAV-based object detection dataset composed of real and synthetic subsets.
A series of experiments are carefully designed with a state-of-the-art object detector to demonstrate the benefits of leveraging the metadata.
arXiv Detail & Related papers (2022-08-31T21:45:16Z) - TS4Net: Two-Stage Sample Selective Strategy for Rotating Object
Detection [6.496301096839213]
The UAV-ROD consists of 1577 images and 30,090 instances of car category annotated by oriented bounding boxes.
The UAV-ROD can be utilized for the rotating object detection, vehicle orientation recognition and object counting tasks.
In this paper, we propose a rotating object detector TS4Net, which contains anchor refinement module (ARM) and two-stage sample selective strategy (TS4)
arXiv Detail & Related papers (2021-08-06T13:38:58Z) - Leveraging domain labels for object detection from UAVs [14.853897011640022]
We propose domain-aware object detectors for Unmanned Aerial Vehicles (UAVs)
In particular, we achieve a new state-of-the-art performance on UAVDT for real-time detectors.
We create a new airborne image dataset by annotating 13 713 objects in 2 900 images featuring precise altitude and viewing angle annotations.
arXiv Detail & Related papers (2021-01-29T16:42:52Z) - Anti-UAV: A Large Multi-Modal Benchmark for UAV Tracking [59.06167734555191]
Unmanned Aerial Vehicle (UAV) offers lots of applications in both commerce and recreation.
We consider the task of tracking UAVs, providing rich information such as location and trajectory.
We propose a dataset, Anti-UAV, with more than 300 video pairs containing over 580k manually annotated bounding boxes.
arXiv Detail & Related papers (2021-01-21T07:00:15Z) - Perceiving Traffic from Aerial Images [86.994032967469]
We propose an object detection method called Butterfly Detector that is tailored to detect objects in aerial images.
We evaluate our Butterfly Detector on two publicly available UAV datasets (UAVDT and VisDrone 2019) and show that it outperforms previous state-of-the-art methods while remaining real-time.
arXiv Detail & Related papers (2020-09-16T11:37:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.