Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision
- URL: http://arxiv.org/abs/2507.20976v1
- Date: Mon, 28 Jul 2025 16:38:06 GMT
- Title: Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision
- Authors: Xiao Fang, Minhyek Jeon, Zheyang Qin, Stanislav Panev, Celso de Melo, Shuowen Hu, Shayok Chakraborty, Fernando De la Torre,
- Abstract summary: This paper proposes a novel method that uses generative AI to synthesize high-quality aerial images and their labels.<n>Our key contribution is the development of a multi-stage, multi-modal knowledge transfer framework.
- Score: 46.87579355047397
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting vehicles in aerial imagery is a critical task with applications in traffic monitoring, urban planning, and defense intelligence. Deep learning methods have provided state-of-the-art (SOTA) results for this application. However, a significant challenge arises when models trained on data from one geographic region fail to generalize effectively to other areas. Variability in factors such as environmental conditions, urban layouts, road networks, vehicle types, and image acquisition parameters (e.g., resolution, lighting, and angle) leads to domain shifts that degrade model performance. This paper proposes a novel method that uses generative AI to synthesize high-quality aerial images and their labels, improving detector training through data augmentation. Our key contribution is the development of a multi-stage, multi-modal knowledge transfer framework utilizing fine-tuned latent diffusion models (LDMs) to mitigate the distribution gap between the source and target environments. Extensive experiments across diverse aerial imagery domains show consistent performance improvements in AP50 over supervised learning on source domain data, weakly supervised adaptation methods, unsupervised domain adaptation methods, and open-set object detectors by 4-23%, 6-10%, 7-40%, and more than 50%, respectively. Furthermore, we introduce two newly annotated aerial datasets from New Zealand and Utah to support further research in this field. Project page is available at: https://humansensinglab.github.io/AGenDA
Related papers
- Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors [62.63467652611788]
We introduce SEMI-TRUTHS, featuring 27,600 real images, 223,400 masks, and 1,472,700 AI-augmented images.
Each augmented image is accompanied by metadata for standardized and targeted evaluation of detector robustness.
Our findings suggest that state-of-the-art detectors exhibit varying sensitivities to the types and degrees of perturbations, data distributions, and augmentation methods used.
arXiv Detail & Related papers (2024-11-12T01:17:27Z) - Radio Map Prediction from Aerial Images and Application to Coverage Optimization [46.870065000932016]
We focus on predicting path loss radio maps using convolutional neural networks.<n>We show that state-of-the-art models developed for existing radio map datasets can be effectively adapted to this task.<n>We introduce a new model dubbed UNetDCN that achieves on par or better performance compared to the state-of-the-art with reduced complexity.
arXiv Detail & Related papers (2024-10-07T09:19:20Z) - Quanv4EO: Empowering Earth Observation by means of Quanvolutional Neural Networks [62.12107686529827]
This article highlights a significant shift towards leveraging quantum computing techniques in processing large volumes of remote sensing data.
The proposed Quanv4EO model introduces a quanvolution method for preprocessing multi-dimensional EO data.
Key findings suggest that the proposed model not only maintains high precision in image classification but also shows improvements of around 5% in EO use cases.
arXiv Detail & Related papers (2024-07-24T09:11:34Z) - Detect, Augment, Compose, and Adapt: Four Steps for Unsupervised Domain
Adaptation in Object Detection [7.064953237013352]
Unsupervised domain adaptation (UDA) plays a crucial role in object detection when adapting a source-trained detector to a target domain without annotated data.
We propose a novel and effective four-step UDA approach that leverages self-supervision and trains source and target data concurrently.
Our approach achieves state-of-the-art performance, improving upon the nearest competitor by more than 2% in terms of mean Average Precision (mAP)
arXiv Detail & Related papers (2023-08-29T14:48:29Z) - Enhancing Visual Domain Adaptation with Source Preparation [5.287588907230967]
Domain Adaptation techniques fail to consider the characteristics of the source domain itself.
We propose Source Preparation (SP), a method to mitigate source domain biases.
We show that SP enhances UDA across a range of visual domains, with improvements up to 40.64% in mIoU over baseline.
arXiv Detail & Related papers (2023-06-16T18:56:44Z) - Analysis and Adaptation of YOLOv4 for Object Detection in Aerial Images [0.0]
Our work shows the adaptation of the popular YOLOv4 framework for predicting the objects and their locations in aerial images.
The trained model resulted in a mean average precision (mAP) of 45.64% with an inference speed reaching 8.7 FPS on the Tesla K80 GPU.
A comparative study with several contemporary aerial object detectors proved that YOLOv4 performed better, implying a more suitable detection algorithm to incorporate on aerial platforms.
arXiv Detail & Related papers (2022-03-18T23:51:09Z) - Adaptive Path Planning for UAVs for Multi-Resolution Semantic
Segmentation [28.104584236205405]
A key challenge is planning missions to maximize the value of acquired data in large environments.
This is, for example, relevant for monitoring agricultural fields.
We propose an online planning algorithm which adapts the UAV paths to obtain high-resolution semantic segmentations.
arXiv Detail & Related papers (2022-03-03T11:03:28Z) - Rethinking Drone-Based Search and Rescue with Aerial Person Detection [79.76669658740902]
The visual inspection of aerial drone footage is an integral part of land search and rescue (SAR) operations today.
We propose a novel deep learning algorithm to automate this aerial person detection (APD) task.
We present the novel Aerial Inspection RetinaNet (AIR) algorithm as the combination of these contributions.
arXiv Detail & Related papers (2021-11-17T21:48:31Z) - AdaZoom: Adaptive Zoom Network for Multi-Scale Object Detection in Large
Scenes [57.969186815591186]
Detection in large-scale scenes is a challenging problem due to small objects and extreme scale variation.
We propose a novel Adaptive Zoom (AdaZoom) network as a selective magnifier with flexible shape and focal length to adaptively zoom the focus regions for object detection.
arXiv Detail & Related papers (2021-06-19T03:30:22Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.