SaRPFF: A Self-Attention with Register-based Pyramid Feature Fusion module for enhanced RLD detection
- URL: http://arxiv.org/abs/2402.16291v2
- Date: Thu, 23 Jan 2025 11:29:02 GMT
- Title: SaRPFF: A Self-Attention with Register-based Pyramid Feature Fusion module for enhanced RLD detection
- Authors: Yunusa Haruna, Shiyin Qin, Abdulrahman Hamman Adama Chukkol, Isah Bello, Adamu Lawan,
- Abstract summary: SaRPFF (Self-Attention with Register-based Pyramid Feature Fusion) is a novel module designed to enhance multi-scale object detection.
It integrates 2D-Multi-Head Self-Attention (MHSA) with Register tokens, improving feature interpretability.
Our approach demonstrates a +2.61% improvement in Average Precision (AP) on the MRLD dataset compared to the baseline FPN method in YOLOv7.
- Score: 0.3262230127283452
- License:
- Abstract: Detecting objects across varying scales is still a challenge in computer vision, particularly in agricultural applications like Rice Leaf Disease (RLD) detection, where objects exhibit significant scale variations (SV). Conventional object detection (OD) like Faster R-CNN, SSD, and YOLO methods often fail to effectively address SV, leading to reduced accuracy and missed detections. To tackle this, we propose SaRPFF (Self-Attention with Register-based Pyramid Feature Fusion), a novel module designed to enhance multi-scale object detection. SaRPFF integrates 2D-Multi-Head Self-Attention (MHSA) with Register tokens, improving feature interpretability by mitigating artifacts within MHSA. Additionally, it integrates efficient attention atrous convolutions into the pyramid feature fusion and introduce a deconvolutional layer for refined up-sampling. We evaluate SaRPFF on YOLOv7 using the MRLD and COCO datasets. Our approach demonstrates a +2.61% improvement in Average Precision (AP) on the MRLD dataset compared to the baseline FPN method in YOLOv7. Furthermore, SaRPFF outperforms other FPN variants, including BiFPN, NAS-FPN, and PANET, showcasing its versatility and potential to advance OD techniques. This study highlights SaRPFF effectiveness in addressing SV challenges and its adaptability across FPN-based OD models.
Related papers
- PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model [76.95536611263356]
PolSAR data presents unique challenges due to its rich and complex characteristics.
Existing data representations, such as complex-valued data, polarimetric features, and amplitude images, are widely used.
Most feature extraction networks for PolSAR are small, limiting their ability to capture features effectively.
We propose the Polarimetric Scattering Mechanism-Informed SAM (PolSAM), an enhanced Segment Anything Model (SAM) that integrates domain-specific scattering characteristics and a novel prompt generation strategy.
arXiv Detail & Related papers (2024-12-17T09:59:53Z) - Multi-Branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for accurate object detection [3.7793767915135295]
We propose a new model named MAF-YOLO in this paper.
It is a novel object detection framework with a versatile neck named Multi-Branch Auxiliary FPN (MAFPN)
Taking the nano version of MAF-YOLO for example, it can achieve 42.4% AP on COCO with only 3.76M learnable parameters and 10.51G FLOPs, and approximately outperforms YOLOv8n by about 5.1%.
arXiv Detail & Related papers (2024-07-05T09:35:30Z) - Multi-scale Quaternion CNN and BiGRU with Cross Self-attention Feature Fusion for Fault Diagnosis of Bearing [5.3598912592106345]
Deep learning has led to significant advances in bearing fault diagnosis (FD)
We propose a novel FD model by integrating multiscale quaternion convolutional neural network (MQCNN), bidirectional gated recurrent unit (BiG), and cross self-attention feature fusion (CSAFF)
arXiv Detail & Related papers (2024-05-25T07:55:02Z) - MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection [54.545054873239295]
Deepfakes have recently raised significant trust issues and security concerns among the public.
ViT-based methods take advantage of the expressivity of transformers, achieving superior detection performance.
This work introduces Mixture-of-Experts modules for Face Forgery Detection (MoE-FFD), a generalized yet parameter-efficient ViT-based approach.
arXiv Detail & Related papers (2024-04-12T13:02:08Z) - Joint Attention-Guided Feature Fusion Network for Saliency Detection of
Surface Defects [69.39099029406248]
We propose a joint attention-guided feature fusion network (JAFFNet) for saliency detection of surface defects based on the encoder-decoder network.
JAFFNet mainly incorporates a joint attention-guided feature fusion (JAFF) module into decoding stages to adaptively fuse low-level and high-level features.
Experiments conducted on SD-saliency-900, Magnetic tile, and DAGM 2007 indicate that our method achieves promising performance in comparison with other state-of-the-art methods.
arXiv Detail & Related papers (2024-02-05T08:10:16Z) - PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly
Detection [65.24854366973794]
Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in domains such as medicine, social networks, and e-commerce.
We introduce a simple method termed PREprocessing and Matching (PREM for short) to improve the efficiency of GAD.
Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities.
arXiv Detail & Related papers (2023-10-18T02:59:57Z) - LF-YOLO: A Lighter and Faster YOLO for Weld Defect Detection of X-ray
Image [7.970559381165446]
We propose a weld defect detection method based on convolution neural network (CNN), namely Lighter and Faster YOLO (LF-YOLO)
To improve the performance of detection network, we propose an efficient feature extraction (EFE) module.
Experimental results show that our weld defect network achieves satisfactory balance between performance and consumption, and reaches 92.9 mAP50 with 61.5 FPS.
arXiv Detail & Related papers (2021-10-28T12:19:32Z) - Hierarchical Dynamic Filtering Network for RGB-D Salient Object
Detection [91.43066633305662]
The main purpose of RGB-D salient object detection (SOD) is how to better integrate and utilize cross-modal fusion information.
In this paper, we explore these issues from a new perspective.
We implement a kind of more flexible and efficient multi-scale cross-modal feature processing.
arXiv Detail & Related papers (2020-07-13T07:59:55Z) - Random Partitioning Forest for Point-Wise and Collective Anomaly
Detection -- Application to Intrusion Detection [9.74672460306765]
DiFF-RF is an ensemble approach composed of random partitioning binary trees to detect anomalies.
Our experiments show that DiFF-RF almost systematically outperforms the isolation forest (IF) algorithm.
Our experience shows that DiFF-RF can work well in the presence of small-scale learning data.
arXiv Detail & Related papers (2020-06-29T10:44:08Z) - Salient Object Detection Combining a Self-attention Module and a Feature
Pyramid Network [10.81245352773775]
We propose a novel pyramid self-attention module (PSAM) and the adoption of an independent feature-complementing strategy.
In PSAM, self-attention layers are equipped after multi-scale pyramid features to capture richer high-level features and bring larger receptive fields to the model.
arXiv Detail & Related papers (2020-04-30T03:08:34Z) - ASFD: Automatic and Scalable Face Detector [129.82350993748258]
We propose a novel Automatic and Scalable Face Detector (ASFD)
ASFD is based on a combination of neural architecture search techniques as well as a new loss design.
Our ASFD-D6 outperforms the prior strong competitors, and our lightweight ASFD-D0 runs at more than 120 FPS with Mobilenet for VGA-resolution images.
arXiv Detail & Related papers (2020-03-25T06:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.