Optimization of Autonomous Driving Image Detection Based on RFAConv and Triplet Attention
- URL: http://arxiv.org/abs/2407.09530v1
- Date: Tue, 25 Jun 2024 08:59:33 GMT
- Title: Optimization of Autonomous Driving Image Detection Based on RFAConv and Triplet Attention
- Authors: Zhipeng Ling, Qi Xin, Yiyu Lin, Guangze Su, Zuwei Shui,
- Abstract summary: This paper proposes a holistic approach to enhance the YOLOv8 model.
C2f_RFAConv module replaces the original module to enhance feature extraction efficiency.
The Triplet Attention mechanism enhances feature focus for enhanced target detection.
- Score: 1.345669927504424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: YOLOv8 plays a crucial role in the realm of autonomous driving, owing to its high-speed target detection, precise identification and positioning, and versatile compatibility across multiple platforms. By processing video streams or images in real-time, YOLOv8 rapidly and accurately identifies obstacles such as vehicles and pedestrians on roadways, offering essential visual data for autonomous driving systems. Moreover, YOLOv8 supports various tasks including instance segmentation, image classification, and attitude estimation, thereby providing comprehensive visual perception for autonomous driving, ultimately enhancing driving safety and efficiency. Recognizing the significance of object detection in autonomous driving scenarios and the challenges faced by existing methods, this paper proposes a holistic approach to enhance the YOLOv8 model. The study introduces two pivotal modifications: the C2f_RFAConv module and the Triplet Attention mechanism. Firstly, the proposed modifications are elaborated upon in the methodological section. The C2f_RFAConv module replaces the original module to enhance feature extraction efficiency, while the Triplet Attention mechanism enhances feature focus. Subsequently, the experimental procedure delineates the training and evaluation process, encompassing training the original YOLOv8, integrating modified modules, and assessing performance improvements using metrics and PR curves. The results demonstrate the efficacy of the modifications, with the improved YOLOv8 model exhibiting significant performance enhancements, including increased MAP values and improvements in PR curves. Lastly, the analysis section elucidates the results and attributes the performance improvements to the introduced modules. C2f_RFAConv enhances feature extraction efficiency, while Triplet Attention improves feature focus for enhanced target detection.
Related papers
- DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Autonomous Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.
Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.
Experiments conducted on nuScenes dataset demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - Research on target detection method of distracted driving behavior based on improved YOLOv8 [6.405098280736171]
This study proposes an improved YOLOv8 detection method based on the original YOLOv8 model by integrating the BoTNet module, GAM attention mechanism and EIoU loss function.
Experimental results show that the improved model performs well in both detection speed and accuracy, with an accuracy rate of 99.4%.
arXiv Detail & Related papers (2024-07-02T00:43:41Z) - MetaFollower: Adaptable Personalized Autonomous Car Following [63.90050686330677]
We propose an adaptable personalized car-following framework - MetaFollower.
We first utilize Model-Agnostic Meta-Learning (MAML) to extract common driving knowledge from various CF events.
We additionally combine Long Short-Term Memory (LSTM) and Intelligent Driver Model (IDM) to reflect temporal heterogeneity with high interpretability.
arXiv Detail & Related papers (2024-06-23T15:30:40Z) - CCDSReFormer: Traffic Flow Prediction with a Criss-Crossed Dual-Stream Enhanced Rectified Transformer Model [32.45713037210818]
We introduce Criss-Crossed Dual-Stream Enhanced Rectified Transformer model (CCDSReFormer)
It includes three innovative modules: Enhanced Rectified Spatial Self-attention (ReSSA), Enhanced Rectified Delay Aware Self-attention (ReDASA) and Enhanced Rectified Temporal Self-attention (ReTSA)
These modules aim to lower computational needs via sparse attention, focus on local information for better traffic dynamics understanding, and merge spatial and temporal insights through a unique learning method.
arXiv Detail & Related papers (2024-03-26T14:43:57Z) - VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness [56.87603097348203]
VeCAF uses labels and natural language annotations to perform parametric data selection for PVM finetuning.
VeCAF incorporates the finetuning objective to select significant data points that effectively guide the PVM towards faster convergence.
On ImageNet, VeCAF uses up to 3.3x less training batches to reach the target performance compared to full finetuning.
arXiv Detail & Related papers (2024-01-15T17:28:37Z) - Exploring Driving Behavior for Autonomous Vehicles Based on Gramian Angular Field Vision Transformer [12.398902878803034]
This paper presents the Gramian Angular Field Vision Transformer (GAF-ViT) model, designed to analyze driving behavior.
The proposed-ViT model consists of three key components: Transformer Module, Channel Attention Module, and Multi-Channel ViT Module.
arXiv Detail & Related papers (2023-10-21T04:24:30Z) - Unsupervised Domain Adaptation for Self-Driving from Past Traversal
Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments.
Our approach enhances LiDAR-based detection models using spatial quantized historical features.
Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z) - Cross-Domain Car Detection Model with Integrated Convolutional Block
Attention Mechanism [3.3843451892622576]
Cross-domain car target detection model with integrated convolutional block Attention mechanism is proposed.
Experimental results show that the performance of the model improves by 40% over the model without our framework.
arXiv Detail & Related papers (2023-05-31T17:28:13Z) - StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception.
We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy.
Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z) - Enhancing Object Detection for Autonomous Driving by Optimizing Anchor
Generation and Addressing Class Imbalance [0.0]
This study presents an enhanced 2D object detector based on Faster R-CNN that is better suited for the context of autonomous vehicles.
The proposed modifications over the Faster R-CNN do not increase computational cost and can easily be extended to optimize other anchor-based detection frameworks.
arXiv Detail & Related papers (2021-04-08T16:58:31Z) - Pluggable Weakly-Supervised Cross-View Learning for Accurate Vehicle
Re-Identification [53.6218051770131]
Cross-view consistent feature representation is key for accurate vehicle ReID.
Existing approaches resort to supervised cross-view learning using extensive extra viewpoints annotations.
We present a pluggable Weakly-supervised Cross-View Learning (WCVL) module for vehicle ReID.
arXiv Detail & Related papers (2021-03-09T11:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.