Related papers: Optimization of Autonomous Driving Image Detection Based on RFAConv and Triplet Attention

Optimization of Autonomous Driving Image Detection Based on RFAConv and Triplet Attention

URL: http://arxiv.org/abs/2407.09530v1
Date: Tue, 25 Jun 2024 08:59:33 GMT
Title: Optimization of Autonomous Driving Image Detection Based on RFAConv and Triplet Attention
Authors: Zhipeng Ling, Qi Xin, Yiyu Lin, Guangze Su, Zuwei Shui,
Abstract summary: This paper proposes a holistic approach to enhance the YOLOv8 model. C2f_RFAConv module replaces the original module to enhance feature extraction efficiency. The Triplet Attention mechanism enhances feature focus for enhanced target detection.
Score: 1.345669927504424
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: YOLOv8 plays a crucial role in the realm of autonomous driving, owing to its high-speed target detection, precise identification and positioning, and versatile compatibility across multiple platforms. By processing video streams or images in real-time, YOLOv8 rapidly and accurately identifies obstacles such as vehicles and pedestrians on roadways, offering essential visual data for autonomous driving systems. Moreover, YOLOv8 supports various tasks including instance segmentation, image classification, and attitude estimation, thereby providing comprehensive visual perception for autonomous driving, ultimately enhancing driving safety and efficiency. Recognizing the significance of object detection in autonomous driving scenarios and the challenges faced by existing methods, this paper proposes a holistic approach to enhance the YOLOv8 model. The study introduces two pivotal modifications: the C2f_RFAConv module and the Triplet Attention mechanism. Firstly, the proposed modifications are elaborated upon in the methodological section. The C2f_RFAConv module replaces the original module to enhance feature extraction efficiency, while the Triplet Attention mechanism enhances feature focus. Subsequently, the experimental procedure delineates the training and evaluation process, encompassing training the original YOLOv8, integrating modified modules, and assessing performance improvements using metrics and PR curves. The results demonstrate the efficacy of the modifications, with the improved YOLOv8 model exhibiting significant performance enhancements, including increased MAP values and improvements in PR curves. Lastly, the analysis section elucidates the results and attributes the performance improvements to the introduced modules. C2f_RFAConv enhances feature extraction efficiency, while Triplet Attention improves feature focus for enhanced target detection.

Related papers

YOLO-APD: Enhancing YOLOv8 for Robust Pedestrian Detection on Complex Road Geometries [0.0]
This paper introduces YOLO-APD, a novel deep learning architecture enhancing the YOLOv8 framework specifically for this challenge.<n>YOLO-APD achieves state-of-the-art detection accuracy, reaching 77.7% mAP@0.5:0.95 and exceptional pedestrian recall exceeding 96%.<n>It maintains real-time processing capabilities at 100 FPS, showcasing a superior balance between accuracy and efficiency.
arXiv Detail & Related papers (2025-07-07T18:03:40Z)
ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding [71.654781631463]
ReAgent-V is a novel agentic video understanding framework.<n>It integrates efficient frame selection with real-time reward generation during inference.<n>Extensive experiments on 12 datasets demonstrate significant gains in generalization and reasoning.
arXiv Detail & Related papers (2025-06-02T04:23:21Z)
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving [51.47621083057114]
SOLVE is an innovative framework that synergizes Vision-Language Models with end-to-end (E2E) models to enhance autonomous vehicle planning.<n>Our approach emphasizes knowledge sharing at the feature level through a shared visual encoder, enabling comprehensive interaction between VLM and E2E components.
arXiv Detail & Related papers (2025-05-22T15:44:30Z)
Application of YOLOv8 in monocular downward multiple Car Target detection [0.0]
This paper presents an improved autonomous target detection network based on YOLOv8.<n>The proposed approach achieves highly efficient and precise detection of multi-scale, small, and remote objects.<n> Experimental results demonstrate that the enhanced model can effectively detect both large and small objects with a detection accuracy of 65%.
arXiv Detail & Related papers (2025-05-15T06:58:45Z)
RAD: Retrieval-Augmented Decision-Making of Meta-Actions with Vision-Language Models in Autonomous Driving [10.984203470464687]
Vision-language models (VLMs) often suffer from limitations such as inadequate spatial perception and hallucination. We propose a retrieval-augmented decision-making (RAD) framework to enhance VLMs' capabilities to reliably generate meta-actions in autonomous driving scenes. We fine-tune VLMs on a dataset derived from the NuScenes dataset to enhance their spatial perception and bird's-eye view image comprehension capabilities.
arXiv Detail & Related papers (2025-03-18T03:25:57Z)
YOLOv12: A Breakdown of the Key Architectural Features [0.5639904484784127]
YOLOv12 is a significant advancement in single-stage, real-time object detection. It incorporates an optimised backbone (R-ELAN), 7x7 separable convolutions, and FlashAttention-driven area-based attention. It offers scalable solutions for both latency-sensitive and high-accuracy applications.
arXiv Detail & Related papers (2025-02-20T17:08:43Z)
Research on vehicle detection based on improved YOLOv8 network [0.0]
This paper proposes an improved YOLOv8 vehicle detection method. The improved model achieves 98.3%, 89.1% and 88.4% detection accuracy for car, Person and Motorcycle.
arXiv Detail & Related papers (2024-12-31T06:19:26Z)
DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving. Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner. Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z)
Research on target detection method of distracted driving behavior based on improved YOLOv8 [6.405098280736171]
This study proposes an improved YOLOv8 detection method based on the original YOLOv8 model by integrating the BoTNet module, GAM attention mechanism and EIoU loss function. Experimental results show that the improved model performs well in both detection speed and accuracy, with an accuracy rate of 99.4%.
arXiv Detail & Related papers (2024-07-02T00:43:41Z)
MetaFollower: Adaptable Personalized Autonomous Car Following [63.90050686330677]
We propose an adaptable personalized car-following framework - MetaFollower. We first utilize Model-Agnostic Meta-Learning (MAML) to extract common driving knowledge from various CF events. We additionally combine Long Short-Term Memory (LSTM) and Intelligent Driver Model (IDM) to reflect temporal heterogeneity with high interpretability.
arXiv Detail & Related papers (2024-06-23T15:30:40Z)
CCDSReFormer: Traffic Flow Prediction with a Criss-Crossed Dual-Stream Enhanced Rectified Transformer Model [32.45713037210818]
We introduce Criss-Crossed Dual-Stream Enhanced Rectified Transformer model (CCDSReFormer) It includes three innovative modules: Enhanced Rectified Spatial Self-attention (ReSSA), Enhanced Rectified Delay Aware Self-attention (ReDASA) and Enhanced Rectified Temporal Self-attention (ReTSA) These modules aim to lower computational needs via sparse attention, focus on local information for better traffic dynamics understanding, and merge spatial and temporal insights through a unique learning method.
arXiv Detail & Related papers (2024-03-26T14:43:57Z)
VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness [56.87603097348203]
VeCAF uses labels and natural language annotations to perform parametric data selection for PVM finetuning. VeCAF incorporates the finetuning objective to select significant data points that effectively guide the PVM towards faster convergence. On ImageNet, VeCAF uses up to 3.3x less training batches to reach the target performance compared to full finetuning.
arXiv Detail & Related papers (2024-01-15T17:28:37Z)
Exploring Driving Behavior for Autonomous Vehicles Based on Gramian Angular Field Vision Transformer [12.398902878803034]
This paper presents the Gramian Angular Field Vision Transformer (GAF-ViT) model, designed to analyze driving behavior. The proposed-ViT model consists of three key components: Transformer Module, Channel Attention Module, and Multi-Channel ViT Module.
arXiv Detail & Related papers (2023-10-21T04:24:30Z)
Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments. Our approach enhances LiDAR-based detection models using spatial quantized historical features. Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z)
Cross-Domain Car Detection Model with Integrated Convolutional Block Attention Mechanism [3.3843451892622576]
Cross-domain car target detection model with integrated convolutional block Attention mechanism is proposed. Experimental results show that the performance of the model improves by 40% over the model without our framework.
arXiv Detail & Related papers (2023-05-31T17:28:13Z)
StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception. We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy. Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z)
Enhancing Object Detection for Autonomous Driving by Optimizing Anchor Generation and Addressing Class Imbalance [0.0]
This study presents an enhanced 2D object detector based on Faster R-CNN that is better suited for the context of autonomous vehicles. The proposed modifications over the Faster R-CNN do not increase computational cost and can easily be extended to optimize other anchor-based detection frameworks.
arXiv Detail & Related papers (2021-04-08T16:58:31Z)
Pluggable Weakly-Supervised Cross-View Learning for Accurate Vehicle Re-Identification [53.6218051770131]
Cross-view consistent feature representation is key for accurate vehicle ReID. Existing approaches resort to supervised cross-view learning using extensive extra viewpoints annotations. We present a pluggable Weakly-supervised Cross-View Learning (WCVL) module for vehicle ReID.
arXiv Detail & Related papers (2021-03-09T11:51:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.