mAPm: multi-scale Attention Pyramid module for Enhanced scale-variation
in RLD detection
- URL: http://arxiv.org/abs/2402.16291v1
- Date: Mon, 26 Feb 2024 04:18:42 GMT
- Title: mAPm: multi-scale Attention Pyramid module for Enhanced scale-variation
in RLD detection
- Authors: Yunusa Haruna, Shiyin Qin, Abdulrahman Hamman Adama Chukkol, Isah
Bello, Adamu Lawan
- Abstract summary: mAPm is a novel approach that integrates dilated convolutions into the Feature Pyramid Network (FPN) to enhance multi-scale information ex-traction.
We evaluate mAPm on YOLOv7 using the MRLD and COCO datasets.
- Score: 0.3499870393443268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting objects across various scales remains a significant challenge in
computer vision, particularly in tasks such as Rice Leaf Disease (RLD)
detection, where objects exhibit considerable scale variations. Traditional
object detection methods often struggle to address these variations, resulting
in missed detections or reduced accuracy. In this study, we propose the
multi-scale Attention Pyramid module (mAPm), a novel approach that integrates
dilated convolutions into the Feature Pyramid Network (FPN) to enhance
multi-scale information ex-traction. Additionally, we incorporate a global
Multi-Head Self-Attention (MHSA) mechanism and a deconvolutional layer to
refine the up-sampling process. We evaluate mAPm on YOLOv7 using the MRLD and
COCO datasets. Compared to vanilla FPN, BiFPN, NAS-FPN, PANET, and ACFPN, mAPm
achieved a significant improvement in Average Precision (AP), with a +2.61%
increase on the MRLD dataset compared to the baseline FPN method in YOLOv7.
This demonstrates its effectiveness in handling scale variations. Furthermore,
the versatility of mAPm allows its integration into various FPN-based object
detection models, showcasing its potential to advance object detection
techniques.
Related papers
- Efficient Feature Aggregation and Scale-Aware Regression for Monocular 3D Object Detection [40.14197775884804]
MonoASRH is a novel monocular 3D detection framework composed of Efficient Hybrid Feature Aggregation Module (EH-FAM) and Adaptive Scale-Aware 3D Regression Head (ASRH)
EH-FAM employs multi-head attention with a global receptive field to extract semantic features for small-scale objects.
ASRH encodes 2D bounding box dimensions and then fuses scale features with the semantic features aggregated by EH-FAM.
arXiv Detail & Related papers (2024-11-05T02:33:25Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization [52.87635234206178]
This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization.
The framework incorporates two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM)
arXiv Detail & Related papers (2024-08-05T08:35:59Z) - FoRA: Low-Rank Adaptation Model beyond Multimodal Siamese Network [19.466279425330857]
We propose a novel multimodal object detector, named Low-rank Modal Adaptors (LMA) with a shared backbone.
Our work was submitted to ACM MM in April 2024, but was rejected.
arXiv Detail & Related papers (2024-07-23T02:27:52Z) - AMANet: Advancing SAR Ship Detection with Adaptive Multi-Hierarchical
Attention Network [0.5437298646956507]
A novel adaptive multi-hierarchical attention module (AMAM) is proposed to learn multi-scale features and adaptively aggregate salient features from various feature layers.
We first fuse information from adjacent feature layers to enhance the detection of smaller targets, thereby achieving multi-scale feature enhancement.
Thirdly, we present a novel adaptive multi-hierarchical attention network (AMANet) by embedding the AMAM between the backbone network and the feature pyramid network.
arXiv Detail & Related papers (2024-01-24T03:56:33Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - UFPMP-Det: Toward Accurate and Efficient Object Detection on Drone
Imagery [26.27705791338182]
This paper proposes a novel approach to object detection on drone imagery, namely Multi- Proxy Detection Network with Unified Foreground Packing (UFPMP-Det)
UFPMP-Det is designed to deal with the numerous instances of very small scales, different from the common solution that divides the high-resolution input image into quite a number of chips with low foreground ratios to perform detection on them each.
Experiments are carried out on the widely used VisDrone and UAVDT datasets, and UFPMP-Det reports new state-of-the-art scores at a much higher speed, highlighting its advantages
arXiv Detail & Related papers (2021-12-20T09:28:44Z) - Improved YOLOv5 network for real-time multi-scale traffic sign detection [4.5598087061051755]
We propose an improved feature pyramid model, named AF-FPN, which utilize the adaptive attention module (AAM) and feature enhancement module (FEM) to reduce the information loss in the process of feature map generation.
We replace the original feature pyramid network in YOLOv5 with AF-FPN, which improves the detection performance for multi-scale targets of the YOLOv5 network.
arXiv Detail & Related papers (2021-12-16T11:02:12Z) - AdaZoom: Adaptive Zoom Network for Multi-Scale Object Detection in Large
Scenes [57.969186815591186]
Detection in large-scale scenes is a challenging problem due to small objects and extreme scale variation.
We propose a novel Adaptive Zoom (AdaZoom) network as a selective magnifier with flexible shape and focal length to adaptively zoom the focus regions for object detection.
arXiv Detail & Related papers (2021-06-19T03:30:22Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z) - Multi-Scale Positive Sample Refinement for Few-Shot Object Detection [61.60255654558682]
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances.
We propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD.
MPSR generates multi-scale positive samples as object pyramids and refines the prediction at various scales.
arXiv Detail & Related papers (2020-07-18T09:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.