TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for
Object Detection on Drone-captured Scenarios
- URL: http://arxiv.org/abs/2108.11539v1
- Date: Thu, 26 Aug 2021 01:24:15 GMT
- Title: TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for
Object Detection on Drone-captured Scenarios
- Authors: Xingkui Zhu, Shuchang Lyu, Xu Wang, Qi Zhao
- Abstract summary: Object detection on drone-captured scenarios is a popular task.
High-speed and low-altitude flight bring in the motion blur on the densely packed objects.
Based on YOLOv5, we add one more prediction head to detect different-scale objects.
We replace the original prediction heads with Transformer Prediction Heads.
- Score: 19.12254722446651
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object detection on drone-captured scenarios is a recent popular task. As
drones always navigate in different altitudes, the object scale varies
violently, which burdens the optimization of networks. Moreover, high-speed and
low-altitude flight bring in the motion blur on the densely packed objects,
which leads to great challenge of object distinction. To solve the two issues
mentioned above, we propose TPH-YOLOv5. Based on YOLOv5, we add one more
prediction head to detect different-scale objects. Then we replace the original
prediction heads with Transformer Prediction Heads (TPH) to explore the
prediction potential with self-attention mechanism. We also integrate
convolutional block attention model (CBAM) to find attention region on
scenarios with dense objects. To achieve more improvement of our proposed
TPH-YOLOv5, we provide bags of useful strategies such as data augmentation,
multiscale testing, multi-model integration and utilizing extra classifier.
Extensive experiments on dataset VisDrone2021 show that TPH-YOLOv5 have good
performance with impressive interpretability on drone-captured scenarios. On
DET-test-challenge dataset, the AP result of TPH-YOLOv5 are 39.18%, which is
better than previous SOTA method (DPNetV3) by 1.81%. On VisDrone Challenge
2021, TPHYOLOv5 wins 5th place and achieves well-matched results with 1st place
model (AP 39.43%). Compared to baseline model (YOLOv5), TPH-YOLOv5 improves
about 7%, which is encouraging and competitive.
Related papers
- DroBoost: An Intelligent Score and Model Boosting Method for Drone Detection [1.2564343689544843]
Drone detection is a challenging object detection task where visibility conditions and quality of the images may be unfavorable.
Our work improves on the previous approach by combining several improvements.
The proposed technique won 1st Place in the Drone vs. Bird Challenge.
arXiv Detail & Related papers (2024-06-30T20:49:56Z) - YOLOv10: Real-Time End-to-End Object Detection [68.28699631793967]
YOLOs have emerged as the predominant paradigm in the field of real-time object detection.
The reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs.
We introduce the holistic efficiency-accuracy driven model design strategy for YOLOs.
arXiv Detail & Related papers (2024-05-23T11:44:29Z) - From Blurry to Brilliant Detection: YOLOv5-Based Aerial Object Detection
with Super Resolution [4.107182710549721]
We present an innovative approach that combines super-resolution and an adapted lightweight YOLOv5 architecture.
Our experimental results demonstrate the model's superior performance in detecting small and densely clustered objects.
arXiv Detail & Related papers (2024-01-26T05:50:58Z) - HIC-YOLOv5: Improved YOLOv5 For Small Object Detection [2.4780916008623834]
An improved YOLOv5 model: HIC-YOLOv5 is proposed to address the aforementioned problems.
An involution block is adopted between the backbone and neck to increase channel information of the feature map.
Our result shows that HIC-YOLOv5 has improved mAP@[.5:.95] by 6.42% and mAP@0.5 by 9.38% on VisDrone 2019-DET dataset.
arXiv Detail & Related papers (2023-09-28T12:40:36Z) - Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism [40.31805155724484]
New designed model named as Gold-YOLO, which boosts the multi-scale feature fusion capabilities.
We implement MAE-style pretraining in the YOLO-series for the first time, allowing YOLOseries models could be to benefit from unsupervised pretraining.
arXiv Detail & Related papers (2023-09-20T14:03:47Z) - YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time
Object Detection [80.11152626362109]
We provide an efficient and performant object detector, termed YOLO-MS.
We train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets.
Our work can also be used as a plug-and-play module for other YOLO models.
arXiv Detail & Related papers (2023-08-10T10:12:27Z) - YOLOv3 with Spatial Pyramid Pooling for Object Detection with Unmanned
Aerial Vehicles [0.0]
We aim to improve the performance of the one-stage detector YOLOv3 by adding a Spatial Pyramid Pooling layer on the end of the backbone darknet-53.
We also conducted an evaluation study on different versions of YOLOv3 methods.
arXiv Detail & Related papers (2023-05-21T04:41:52Z) - Recurrent Vision Transformers for Object Detection with Event Cameras [62.27246562304705]
We present Recurrent Vision Transformers (RVTs), a novel backbone for object detection with event cameras.
RVTs can be trained from scratch to reach state-of-the-art performance on event-based object detection.
Our study brings new insights into effective design choices that can be fruitful for research beyond event-based vision.
arXiv Detail & Related papers (2022-12-11T20:28:59Z) - A lightweight and accurate YOLO-like network for small target detection
in Aerial Imagery [94.78943497436492]
We present YOLO-S, a simple, fast and efficient network for small target detection.
YOLO-S exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation.
YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.
arXiv Detail & Related papers (2022-04-05T16:29:49Z) - Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
without Convolutions [103.03973037619532]
This work investigates a simple backbone network useful for many dense prediction tasks without convolutions.
Unlike the recently-proposed Transformer model (e.g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer(PVT)
PVT can be not only trained on dense partitions of the image to achieve high output resolution, which is important for dense predictions.
arXiv Detail & Related papers (2021-02-24T08:33:55Z) - Improving 3D Object Detection through Progressive Population Based
Augmentation [91.56261177665762]
We present the first attempt to automate the design of data augmentation policies for 3D object detection.
We introduce the Progressive Population Based Augmentation (PPBA) algorithm, which learns to optimize augmentation strategies by narrowing down the search space and adopting the best parameters discovered in previous iterations.
We find that PPBA may be up to 10x more data efficient than baseline 3D detection models without augmentation, highlighting that 3D detection models may achieve competitive accuracy with far fewer labeled examples.
arXiv Detail & Related papers (2020-04-02T05:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.