UFPMP-Det: Toward Accurate and Efficient Object Detection on Drone
Imagery
- URL: http://arxiv.org/abs/2112.10415v1
- Date: Mon, 20 Dec 2021 09:28:44 GMT
- Title: UFPMP-Det: Toward Accurate and Efficient Object Detection on Drone
Imagery
- Authors: Yecheng Huang, Jiaxin Chen, Di Huang
- Abstract summary: This paper proposes a novel approach to object detection on drone imagery, namely Multi- Proxy Detection Network with Unified Foreground Packing (UFPMP-Det)
UFPMP-Det is designed to deal with the numerous instances of very small scales, different from the common solution that divides the high-resolution input image into quite a number of chips with low foreground ratios to perform detection on them each.
Experiments are carried out on the widely used VisDrone and UAVDT datasets, and UFPMP-Det reports new state-of-the-art scores at a much higher speed, highlighting its advantages
- Score: 26.27705791338182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a novel approach to object detection on drone imagery,
namely Multi-Proxy Detection Network with Unified Foreground Packing
(UFPMP-Det). To deal with the numerous instances of very small scales,
different from the common solution that divides the high-resolution input image
into quite a number of chips with low foreground ratios to perform detection on
them each, the Unified Foreground Packing (UFP) module is designed, where the
sub-regions given by a coarse detector are initially merged through clustering
to suppress background and the resulting ones are subsequently packed into a
mosaic for a single inference, thus significantly reducing overall time cost.
Furthermore, to address the more serious confusion between inter-class
similarities and intra-class variations of instances, which deteriorates
detection performance but is rarely discussed, the Multi-Proxy Detection
Network (MP-Det) is presented to model object distributions in a fine-grained
manner by employing multiple proxy learning, and the proxies are enforced to be
diverse by minimizing a Bag-of-Instance-Words (BoIW) guided optimal transport
loss. By such means, UFPMP-Det largely promotes both the detection accuracy and
efficiency. Extensive experiments are carried out on the widely used VisDrone
and UAVDT datasets, and UFPMP-Det reports new state-of-the-art scores at a much
higher speed, highlighting its advantages.
Related papers
- Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization [52.87635234206178]
This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization.
The framework incorporates two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM)
arXiv Detail & Related papers (2024-08-05T08:35:59Z) - Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping [12.442574943138794]
The paper explores the industrial multimodal Anomaly Detection (AD) task, which exploits point clouds and RGB images to localize anomalies.
We introduce a novel light and fast framework that learns to map features from one modality to the other on nominal samples.
arXiv Detail & Related papers (2023-12-07T18:41:21Z) - Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images [1.662438436885552]
Multi-modal fusion has been determined to enhance the accuracy by fusing data from multiple modalities.
We propose a novel multi-modal fusion strategy for mapping relationships between different channels at the early stage.
By addressing fusion in the early stage, as opposed to mid or late-stage methods, our method achieves competitive and even superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-10-21T00:56:11Z) - Infrared Small Target Detection Using Double-Weighted Multi-Granularity
Patch Tensor Model With Tensor-Train Decomposition [6.517559383143804]
This paper proposes a novel double-weighted multi-granularity infrared patch tensor (DWMGIPT) model.
The proposed algorithm is robust to noise and different scenes.
arXiv Detail & Related papers (2023-10-09T02:17:31Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Adaptive Sparse Convolutional Networks with Global Context Enhancement
for Faster Object Detection on Drone Images [26.51970603200391]
This paper investigates optimizing the detection head based on the sparse convolution.
It suffers from inadequate integration of contextual information of tiny objects.
We propose a novel global context-enhanced adaptive sparse convolutional network.
arXiv Detail & Related papers (2023-03-25T14:42:50Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - MRDet: A Multi-Head Network for Accurate Oriented Object Detection in
Aerial Images [51.227489316673484]
We propose an arbitrary-oriented region proposal network (AO-RPN) to generate oriented proposals transformed from horizontal anchors.
To obtain accurate bounding boxes, we decouple the detection task into multiple subtasks and propose a multi-head network.
Each head is specially designed to learn the features optimal for the corresponding task, which allows our network to detect objects accurately.
arXiv Detail & Related papers (2020-12-24T06:36:48Z) - MuCAN: Multi-Correspondence Aggregation Network for Video
Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame.
Inter- and intra-frames are the key sources for exploiting temporal and spatial information.
We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z) - Multi-Scale Positive Sample Refinement for Few-Shot Object Detection [61.60255654558682]
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances.
We propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD.
MPSR generates multi-scale positive samples as object pyramids and refines the prediction at various scales.
arXiv Detail & Related papers (2020-07-18T09:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.