DASSF: Dynamic-Attention Scale-Sequence Fusion for Aerial Object Detection
- URL: http://arxiv.org/abs/2406.12285v2
- Date: Sat, 22 Jun 2024 07:48:32 GMT
- Title: DASSF: Dynamic-Attention Scale-Sequence Fusion for Aerial Object Detection
- Authors: Haodong Li, Haicheng Qu,
- Abstract summary: The original YOLO algorithm has low overall detection accuracy due to its weak ability to perceive targets of different scales.
This paper proposes a dynamic-attention scale-sequence fusion algorithm (DASSF) for small target detection in aerial images.
Experimental results show that when the DASSF method is applied to YOLOv8, compared to YOLOv8n, the model shows an increase of 9.2% and 2.4% in the mean average precision (mAP)
- Score: 6.635903943457569
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The detection of small objects in aerial images is a fundamental task in the field of computer vision. Moving objects in aerial photography have problems such as different shapes and sizes, dense overlap, occlusion by the background, and object blur, however, the original YOLO algorithm has low overall detection accuracy due to its weak ability to perceive targets of different scales. In order to improve the detection accuracy of densely overlapping small targets and fuzzy targets, this paper proposes a dynamic-attention scale-sequence fusion algorithm (DASSF) for small target detection in aerial images. First, we propose a dynamic scale sequence feature fusion (DSSFF) module that improves the up-sampling mechanism and reduces computational load. Secondly, a x-small object detection head is specially added to enhance the detection capability of small targets. Finally, in order to improve the expressive ability of targets of different types and sizes, we use the dynamic head (DyHead). The model we proposed solves the problem of small target detection in aerial images and can be applied to multiple different versions of the YOLO algorithm, which is universal. Experimental results show that when the DASSF method is applied to YOLOv8, compared to YOLOv8n, on the VisDrone-2019 and DIOR datasets, the model shows an increase of 9.2% and 2.4% in the mean average precision (mAP), respectively, and outperforms the current mainstream methods.
Related papers
- SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection [59.868772767818975]
We propose a simple yet effective Semi-supervised Oriented Object Detection method termed SOOD++.
Specifically, we observe that objects from aerial images are usually arbitrary orientations, small scales, and aggregation.
Extensive experiments conducted on various multi-oriented object datasets under various labeled settings demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-07-01T07:03:51Z) - YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images [33.80392696735718]
YOLC (You Only Look Clusters) is an efficient and effective framework that builds on an anchor-free object detector, CenterNet.
To overcome the challenges posed by large-scale images and non-uniform object distribution, we introduce a Local Scale Module (LSM) that adaptively searches cluster regions for zooming in for accurate detection.
We perform extensive experiments on two aerial image datasets, including Visdrone 2019 and UAVDT, to demonstrate the effectiveness and superiority of our proposed approach.
arXiv Detail & Related papers (2024-04-09T10:03:44Z) - SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised
Learning for Robust Infrared Small Target Detection [53.19618419772467]
Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds.
With the development of Transformer, the scale of SIRST models is constantly increasing.
With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed.
arXiv Detail & Related papers (2024-03-08T16:14:54Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - An advanced YOLOv3 method for small object detection [2.906551456030129]
This paper introduces an improved YOLOv3 algorithm for small object detection.
In the proposed method, the dilated convolutions mish (DCM) module is introduced into the backbone network of YOLOv3.
In the neck network of YOLOv3, the convolutional block attention module (CBAM) and multi-level fusion module are introduced.
arXiv Detail & Related papers (2022-12-06T07:58:21Z) - ARUBA: An Architecture-Agnostic Balanced Loss for Aerial Object
Detection [24.085715205081385]
We denote size of an object as the number of pixels it covers in an image and size imbalance as the over-representation of certain sizes of objects in a dataset.
We propose a novel ARchitectUre-agnostic BAlanced Loss (ARUBA) that can be applied as a plugin on top of any object detection model.
arXiv Detail & Related papers (2022-10-10T11:28:16Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - SALISA: Saliency-based Input Sampling for Efficient Video Object
Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection.
We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Underwater object detection using Invert Multi-Class Adaboost with deep
learning [37.14538666012363]
We propose a novel neural network architecture, namely Sample-WeIghted hyPEr Network (SWIPENet), for small object detection.
We show that the proposed SWIPENet+IMA framework achieves better performance in detection accuracy against several state-of-the-art object detection approaches.
arXiv Detail & Related papers (2020-05-23T15:30:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.