SyNet: An Ensemble Network for Object Detection in UAV Images
- URL: http://arxiv.org/abs/2012.12991v1
- Date: Wed, 23 Dec 2020 21:38:32 GMT
- Title: SyNet: An Ensemble Network for Object Detection in UAV Images
- Authors: Berat Mert Albaba, Sedat Ozer
- Abstract summary: In this paper, we propose an ensemble network, SyNet, that combines a multi-stage method with a single-stage one.
As building blocks, CenterNet and Cascade R-CNN with pretrained feature extractors are utilized along with an ensembling strategy.
We report the state of the art results obtained by our proposed solution on two different datasets.
- Score: 13.198689566654107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in camera equipped drone applications and their widespread
use increased the demand on vision based object detection algorithms for aerial
images. Object detection process is inherently a challenging task as a generic
computer vision problem, however, since the use of object detection algorithms
on UAVs (or on drones) is relatively a new area, it remains as a more
challenging problem to detect objects in aerial images. There are several
reasons for that including: (i) the lack of large drone datasets including
large object variance, (ii) the large orientation and scale variance in drone
images when compared to the ground images, and (iii) the difference in texture
and shape features between the ground and the aerial images. Deep learning
based object detection algorithms can be classified under two main categories:
(a) single-stage detectors and (b) multi-stage detectors. Both single-stage and
multi-stage solutions have their advantages and disadvantages over each other.
However, a technique to combine the good sides of each of those solutions could
yield even a stronger solution than each of those solutions individually. In
this paper, we propose an ensemble network, SyNet, that combines a multi-stage
method with a single-stage one with the motivation of decreasing the high false
negative rate of multi-stage detectors and increasing the quality of the
single-stage detector proposals. As building blocks, CenterNet and Cascade
R-CNN with pretrained feature extractors are utilized along with an ensembling
strategy. We report the state of the art results obtained by our proposed
solution on two different datasets: namely MS-COCO and visDrone with \%52.1
$mAP_{IoU = 0.75}$ is obtained on MS-COCO $val2017$ dataset and \%26.2
$mAP_{IoU = 0.75}$ is obtained on VisDrone $test-set$.
Related papers
- SCLNet: A Scale-Robust Complementary Learning Network for Object Detection in UAV Images [0.0]
This paper introduces a scale-robust complementary learning network (SCLNet) to address the scale challenges.
One implementation is based on our proposed scale-complementary decoder and scale-complementary loss function.
Another implementation is based on our proposed contrastive complement network and contrastive complement loss function.
arXiv Detail & Related papers (2024-09-11T05:39:25Z) - Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery [51.83786195178233]
We design a Knowledge Discovery Network (KDN) to implement the renormalization group theory in terms of efficient feature extraction.
Renormalized connection (RC) on the KDN enables synergistic focusing'' of multi-scale features.
RCs extend the multi-level feature's divide-and-conquer'' mechanism of the FPN-based detectors to a wide range of scale-preferred tasks.
arXiv Detail & Related papers (2024-09-09T13:56:22Z) - SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection [59.868772767818975]
We propose a simple yet effective Semi-supervised Oriented Object Detection method termed SOOD++.
Specifically, we observe that objects from aerial images are usually arbitrary orientations, small scales, and aggregation.
Extensive experiments conducted on various multi-oriented object datasets under various labeled settings demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-07-01T07:03:51Z) - YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images [33.80392696735718]
YOLC (You Only Look Clusters) is an efficient and effective framework that builds on an anchor-free object detector, CenterNet.
To overcome the challenges posed by large-scale images and non-uniform object distribution, we introduce a Local Scale Module (LSM) that adaptively searches cluster regions for zooming in for accurate detection.
We perform extensive experiments on two aerial image datasets, including Visdrone 2019 and UAVDT, to demonstrate the effectiveness and superiority of our proposed approach.
arXiv Detail & Related papers (2024-04-09T10:03:44Z) - Multi-Stage Fusion Architecture for Small-Drone Localization and Identification Using Passive RF and EO Imagery: A Case Study [0.1872664641238533]
This work develops a multi-stage fusion architecture using passive radio frequency (RF) and electro-optic (EO) imagery data.
Supervised deep learning based techniques and unsupervised foreground/background separation techniques are explored to cope with challenging environments.
The proposed fusion architecture is tested and the tracking and performance is quantified over the range.
arXiv Detail & Related papers (2024-03-30T22:53:28Z) - Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based
Object Re-Identification [38.19907319079833]
We propose a multitask learning approach, which employs a new multiscale architecture without convolution, Pyramid Vision Transformer (PVT) as the backbone for UAV-based object ReID.
By uncertainty modeling of intraclass variations, our proposed model can be jointly optimized using both uncertainty-aware object ID and camera ID information.
arXiv Detail & Related papers (2022-09-19T00:27:07Z) - Enhanced Single-shot Detector for Small Object Detection in Remote
Sensing Images [33.84369068593722]
We propose image pyramid single-shot detector (IPSSD) for small-scale object detection.
In IPSSD, single-shot detector is adopted combined with an image pyramid network to extract semantically strong features for generating candidate regions.
The proposed network can enhance the small-scale features from a feature pyramid network.
arXiv Detail & Related papers (2022-05-12T07:35:07Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image.
We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images.
With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z) - MRDet: A Multi-Head Network for Accurate Oriented Object Detection in
Aerial Images [51.227489316673484]
We propose an arbitrary-oriented region proposal network (AO-RPN) to generate oriented proposals transformed from horizontal anchors.
To obtain accurate bounding boxes, we decouple the detection task into multiple subtasks and propose a multi-head network.
Each head is specially designed to learn the features optimal for the corresponding task, which allows our network to detect objects accurately.
arXiv Detail & Related papers (2020-12-24T06:36:48Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.