Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
- URL: http://arxiv.org/abs/2301.12058v1
- Date: Sat, 28 Jan 2023 02:25:30 GMT
- Title: Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
- Authors: Liya Wang, Alex Tien
- Abstract summary: Vision Transformer Detector (ViTDet) was proposed to extract multi-scale features for object detection.
ViTDet's simple design achieves good performance on natural scene images and can be easily embedded into any detector architecture.
Our results show that ViTDet can consistently outperform its convolutional neural network counterparts on horizontal bounding box (HBB) object detection.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The past few years have seen an increased interest in aerial image object
detection due to its critical value to large-scale geo-scientific research like
environmental studies, urban planning, and intelligence monitoring. However,
the task is very challenging due to the birds-eye view perspective, complex
backgrounds, large and various image sizes, different appearances of objects,
and the scarcity of well-annotated datasets. Recent advances in computer vision
have shown promise tackling the challenge. Specifically, Vision Transformer
Detector (ViTDet) was proposed to extract multi-scale features for object
detection. The empirical study shows that ViTDet's simple design achieves good
performance on natural scene images and can be easily embedded into any
detector architecture. To date, ViTDet's potential benefit to challenging
aerial image object detection has not been explored. Therefore, in our study,
25 experiments were carried out to evaluate the effectiveness of ViTDet for
aerial image object detection on three well-known datasets: Airbus Aircraft,
RarePlanes, and Dataset of Object DeTection in Aerial images (DOTA). Our
results show that ViTDet can consistently outperform its convolutional neural
network counterparts on horizontal bounding box (HBB) object detection by a
large margin (up to 17% on average precision) and that it achieves the
competitive performance for oriented bounding box (OBB) object detection. Our
results also establish a baseline for future research.
Related papers
- Analysis of Object Detection Models for Tiny Object in Satellite Imagery: A Dataset-Centric Approach [0.0]
This paper delves into the domain of Small-Object-Detection (SOD) within satellite imagery.
Traditional object detection models face difficulties in detecting small objects due to limited contextual information and class imbalances.
Our study aims to provide valuable insights into small object detection in satellite imagery by empirically evaluating state-of-the-art models.
arXiv Detail & Related papers (2024-12-12T07:06:22Z) - Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection [54.78470057491049]
Occupancy has emerged as a promising alternative for 3D scene perception.
We introduce object-centric occupancy as a supplement to object bboxes.
We show that our occupancy features significantly enhance the detection results of state-of-the-art 3D object detectors.
arXiv Detail & Related papers (2024-12-06T16:12:38Z) - NBBOX: Noisy Bounding Box Improves Remote Sensing Object Detection [11.564184330068775]
This letter presents a thorough investigation of bounding box transformation in terms of scaling, rotation, and translation for remote sensing object detection.
We conduct extensive experiments on DOTA and DIOR-R, both well-known datasets that include a variety of rotated generic objects in aerial images.
Experimental results show that our approach significantly improves remote sensing object detection without whistles and bells.
arXiv Detail & Related papers (2024-09-14T12:25:14Z) - FlightScope: An Experimental Comparative Review of Aircraft Detection Algorithms in Satellite Imagery [2.9687381456164004]
This paper critically evaluates and compares a suite of advanced object detection algorithms customized for the task of identifying aircraft within satellite imagery.
This research encompasses an array of methodologies including YOLO versions 5 and 8, Faster RCNN, CenterNet, RetinaNet, RTMDet, and DETR, all trained from scratch.
YOLOv5 emerges as a robust solution for aerial object detection, underlining its importance through superior mean average precision, Recall, and Intersection over Union scores.
arXiv Detail & Related papers (2024-04-03T17:24:27Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - Aerial Monocular 3D Object Detection [67.20369963664314]
DVDET is proposed to achieve aerial monocular 3D object detection in both the 2D image space and the 3D physical space.
To address the severe view deformation issue, we propose a novel trainable geo-deformable transformation module.
To encourage more researchers to investigate this area, we will release the dataset and related code.
arXiv Detail & Related papers (2022-08-08T08:32:56Z) - Object Detection in Aerial Images: What Improves the Accuracy? [9.857292888257144]
deep learning-based object detection approaches have been actively explored for the problem of object detection in aerial images.
In this work, we investigate the impact of Faster R-CNN for aerial object detection and explore numerous strategies to improve its performance for aerial images.
arXiv Detail & Related papers (2022-01-21T16:22:48Z) - Object Detection in Aerial Images: A Large-Scale Benchmark and
Challenges [124.48654341780431]
We present a large-scale dataset of Object deTection in Aerial images (DOTA) and comprehensive baselines for ODAI.
The proposed DOTA dataset contains 1,793,658 object instances of 18 categories of oriented-bounding-box annotations collected from 11,268 aerial images.
We build baselines covering 10 state-of-the-art algorithms with over 70 configurations, where the speed and accuracy performances of each model have been evaluated.
arXiv Detail & Related papers (2021-02-24T11:20:55Z) - Perceiving Traffic from Aerial Images [86.994032967469]
We propose an object detection method called Butterfly Detector that is tailored to detect objects in aerial images.
We evaluate our Butterfly Detector on two publicly available UAV datasets (UAVDT and VisDrone 2019) and show that it outperforms previous state-of-the-art methods while remaining real-time.
arXiv Detail & Related papers (2020-09-16T11:37:43Z) - EAGLE: Large-scale Vehicle Detection Dataset in Real-World Scenarios
using Aerial Imagery [3.8902657229395894]
We introduce a large-scale dataset for multi-class vehicle detection with object orientation information in aerial imagery.
It features high-resolution aerial images composed of different real-world situations with a wide variety of camera sensor, resolution, flight altitude, weather, illumination, haze, shadow, time, city, country, occlusion, and camera angle.
It contains 215,986 instances annotated with oriented bounding boxes defined by four points and orientation, making it by far the largest dataset to date in this task.
It also supports researches on the haze and shadow removal as well as super-resolution and in-painting applications.
arXiv Detail & Related papers (2020-07-12T23:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.