Towards Robust Drone Vision in the Wild
- URL: http://arxiv.org/abs/2208.12655v1
- Date: Sun, 21 Aug 2022 18:19:19 GMT
- Title: Towards Robust Drone Vision in the Wild
- Authors: Xiaoyu Lin
- Abstract summary: In this thesis, we propose the first image super-resolution dataset for drone vision.
We collect data at different altitudes and then propose pre-processing steps to align image pairs.
Two methods are proposed to build a robust image super-resolution network at different altitudes.
- Score: 1.14219428942199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The past few years have witnessed the burst of drone-based applications where
computer vision plays an essential role. However, most public drone-based
vision datasets focus on detection and tracking. On the other hand, the
performance of most existing image super-resolution methods is sensitive to the
dataset, specifically, the degradation model between high-resolution and
low-resolution images. In this thesis, we propose the first image
super-resolution dataset for drone vision. Image pairs are captured by two
cameras on the drone with different focal lengths. We collect data at different
altitudes and then propose pre-processing steps to align image pairs. Extensive
empirical studies show domain gaps exist among images captured at different
altitudes. Meanwhile, the performance of pretrained image super-resolution
networks also suffers a drop on our dataset and varies among altitudes.
Finally, we propose two methods to build a robust image super-resolution
network at different altitudes. The first feeds altitude information into the
network through altitude-aware layers. The second uses one-shot learning to
quickly adapt the super-resolution model to unknown altitudes. Our results
reveal that the proposed methods can efficiently improve the performance of
super-resolution networks at varying altitudes.
Related papers
- Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery [51.73680703579997]
We present a neural radiance field method for urban-scale semantic and building-level instance segmentation from aerial images.
objects in urban aerial images exhibit substantial variations in size, including buildings, cars, and roads.
We introduce a scale-adaptive semantic label fusion strategy that enhances the segmentation of objects of varying sizes.
We then introduce a novel cross-view instance label grouping strategy to mitigate the multi-view inconsistency problem in the 2D instance labels.
arXiv Detail & Related papers (2024-03-18T14:15:39Z) - Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve
Aerial Visual Perception? [57.77643186237265]
We present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives.
MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes.
This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets.
arXiv Detail & Related papers (2023-12-07T18:59:14Z) - TransVisDrone: Spatio-Temporal Transformer for Vision-based
Drone-to-Drone Detection in Aerial Videos [57.92385818430939]
Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones.
Existing methods are computationally costly, follow non-end-to-end optimization, and have complex multi-stage pipelines, making them less suitable for real-time deployment on edge devices.
We propose a simple yet effective framework, itTransVisDrone, that provides an end-to-end solution with higher computational efficiency.
arXiv Detail & Related papers (2022-10-16T03:05:13Z) - Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based
Object Re-Identification [38.19907319079833]
We propose a multitask learning approach, which employs a new multiscale architecture without convolution, Pyramid Vision Transformer (PVT) as the backbone for UAV-based object ReID.
By uncertainty modeling of intraclass variations, our proposed model can be jointly optimized using both uncertainty-aware object ID and camera ID information.
arXiv Detail & Related papers (2022-09-19T00:27:07Z) - DSR: Towards Drone Image Super-Resolution [10.679618027862846]
We propose a novel drone image dataset, with scenes captured at low and high resolutions, and across a span of altitudes.
Our results show that off-the-shelf state-of-the-art networks witness a significant drop in performance on this different domain.
We additionally show that simple fine-tuning, and incorporating altitude awareness into the network's architecture, both improve the reconstruction performance.
arXiv Detail & Related papers (2022-08-25T19:58:54Z) - SUES-200: A Multi-height Multi-scene Cross-view Image Benchmark Across
Drone and Satellite [0.0]
The purpose of cross-view image matching is to match images acquired from different platforms of the same target scene.
SUES-200 is the first dataset that considers the differences generated by aerial photography of drones at different flight heights.
arXiv Detail & Related papers (2022-04-22T13:49:52Z) - Small or Far Away? Exploiting Deep Super-Resolution and Altitude Data
for Aerial Animal Surveillance [3.8015092217142223]
We show that a holistic attention network based super-resolution approach and a custom-built altitude data exploitation network can increase the detection efficacy in real-world settings.
We evaluate the system on two public, large aerial-capture animal datasets, SAVMAP and AED.
arXiv Detail & Related papers (2021-11-12T17:30:55Z) - Dogfight: Detecting Drones from Drones Videos [58.158988162743825]
This paper attempts to address the problem of drones detection from other flying drones variations.
The erratic movement of the source and target drones, small size, arbitrary shape, large intensity, and occlusion make this problem quite challenging.
To handle this, instead of using region-proposal based methods, we propose to use a two-stage segmentation-based approach.
arXiv Detail & Related papers (2021-03-31T17:43:31Z) - Exploiting Raw Images for Real-Scene Super-Resolution [105.18021110372133]
We study the problem of real-scene single image super-resolution to bridge the gap between synthetic data and real captured images.
We propose a method to generate more realistic training data by mimicking the imaging process of digital cameras.
We also develop a two-branch convolutional neural network to exploit the radiance information originally-recorded in raw images.
arXiv Detail & Related papers (2021-02-02T16:10:15Z) - SyNet: An Ensemble Network for Object Detection in UAV Images [13.198689566654107]
In this paper, we propose an ensemble network, SyNet, that combines a multi-stage method with a single-stage one.
As building blocks, CenterNet and Cascade R-CNN with pretrained feature extractors are utilized along with an ensembling strategy.
We report the state of the art results obtained by our proposed solution on two different datasets.
arXiv Detail & Related papers (2020-12-23T21:38:32Z) - University-1652: A Multi-view Multi-source Benchmark for Drone-based
Geo-localization [87.74121935246937]
We introduce a new multi-view benchmark for drone-based geo-localization, named University-1652.
University-1652 contains data from three platforms, i.e., synthetic drones, satellites and ground cameras of 1,652 university buildings around the world.
Experiments show that University-1652 helps the model to learn the viewpoint-invariant features and also has good generalization ability in the real-world scenario.
arXiv Detail & Related papers (2020-02-27T15:24:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.