Advancing Autonomous Vehicle Intelligence: Deep Learning and Multimodal LLM for Traffic Sign Recognition and Robust Lane Detection
- URL: http://arxiv.org/abs/2503.06313v1
- Date: Sat, 08 Mar 2025 19:12:36 GMT
- Title: Advancing Autonomous Vehicle Intelligence: Deep Learning and Multimodal LLM for Traffic Sign Recognition and Robust Lane Detection
- Authors: Chandan Kumar Sah, Ankit Kumar Shaw, Xiaoli Lian, Arsalan Shahid Baig, Tuopu Wen, Kun Jiang, Mengmeng Yang, Diange Yang,
- Abstract summary: This paper introduces an integrated approach combining advanced deep learning techniques and Multimodal Large Language Models (MLLMs) for comprehensive road perception.<n>For traffic sign recognition, we evaluate ResNet-50, Yv8, and RT-DETR, achieving state-of-the-art performance of 99.8% with ResNet-50, 98.0% accuracy with YOLOv8, and achieved 96.6% accuracy in RT-DETR.<n>For lane detection, we propose a CNN-based segmentation method enhanced by curve fitting, which delivers high accuracy under favorable conditions.
- Score: 11.743721109110792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous vehicles (AVs) require reliable traffic sign recognition and robust lane detection capabilities to ensure safe navigation in complex and dynamic environments. This paper introduces an integrated approach combining advanced deep learning techniques and Multimodal Large Language Models (MLLMs) for comprehensive road perception. For traffic sign recognition, we systematically evaluate ResNet-50, YOLOv8, and RT-DETR, achieving state-of-the-art performance of 99.8% with ResNet-50, 98.0% accuracy with YOLOv8, and achieved 96.6% accuracy in RT-DETR despite its higher computational complexity. For lane detection, we propose a CNN-based segmentation method enhanced by polynomial curve fitting, which delivers high accuracy under favorable conditions. Furthermore, we introduce a lightweight, Multimodal, LLM-based framework that directly undergoes instruction tuning using small yet diverse datasets, eliminating the need for initial pretraining. This framework effectively handles various lane types, complex intersections, and merging zones, significantly enhancing lane detection reliability by reasoning under adverse conditions. Despite constraints in available training resources, our multimodal approach demonstrates advanced reasoning capabilities, achieving a Frame Overall Accuracy (FRM) of 53.87%, a Question Overall Accuracy (QNS) of 82.83%, lane detection accuracies of 99.6% in clear conditions and 93.0% at night, and robust performance in reasoning about lane invisibility due to rain (88.4%) or road degradation (95.6%). The proposed comprehensive framework markedly enhances AV perception reliability, thus contributing significantly to safer autonomous driving across diverse and challenging road scenarios.
Related papers
- AurigaNet: A Real-Time Multi-Task Network for Enhanced Urban Driving Perception [0.0]
Self-driving cars hold significant potential to reduce traffic accidents, alleviate congestion, and enhance urban mobility.<n>However, developing reliable AI systems for autonomous vehicles remains a substantial challenge.<n>We present AurigaNet, an advanced multi-task network architecture designed to push the boundaries of autonomous driving perception.
arXiv Detail & Related papers (2026-02-11T09:04:29Z) - SAVANT: Semantic Analysis with Vision-Augmented Anomaly deTection [6.806105013817923]
SAVANT is a structured reasoning framework that achieves high accuracy and recall in detecting anomalous driving scenarios.<n>By automatically labeling over 9,640 real-world images with high accuracy, SAVANT addresses the critical data scarcity problem in anomaly detection.
arXiv Detail & Related papers (2025-10-20T19:14:29Z) - Contrastive Learning-Driven Traffic Sign Perception: Multi-Modal Fusion of Text and Vision [2.0720154517628417]
We propose a novel framework combining open-vocabulary detection and cross-modal learning.<n>For traffic sign detection, our NanoVerse YOLO model integrates a vision-language path aggregation network (RepVL-PAN) and an SPD-Conv module.<n>For traffic sign classification, we designed a Traffic Sign Recognition Multimodal Contrastive Learning model (TSR-MCL)<n>On the TT100K dataset, our method achieves a state-of-the-art 78.4% mAP in the long-tail detection task for all-class recognition.
arXiv Detail & Related papers (2025-07-31T08:23:30Z) - Lane-Wise Highway Anomaly Detection [8.086502588472783]
This paper proposes a scalable and interpretable framework for lane-wise highway traffic anomaly detection.<n>Unlike traditional sensor-dependent methods, our approach uses AI-powered vision models to extract lane-specific features.<n>Our framework outperforms state-of-the-art methods in precision, recall, and F1-score.
arXiv Detail & Related papers (2025-05-05T12:32:23Z) - Training A Neural Network For Partially Occluded Road Sign Identification In The Context Of Autonomous Vehicles [0.0]
We investigated how partial occlusion of traffic signs affects their recognition.
We compared the performance of our custom convolutional neural network (CNN), which achieved 96% accuracy, with models trained using transfer learning.
Additional experiments revealed that models trained solely on fully visible signs lose effectiveness when recognizing occluded signs.
arXiv Detail & Related papers (2025-03-23T19:25:56Z) - SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
Multimodal Large Language Models (MLLMs) can process both visual and textual data.<n>We propose SafeAuto, a novel framework that enhances MLLM-based autonomous driving systems by incorporating both unstructured and structured knowledge.
arXiv Detail & Related papers (2025-02-28T21:53:47Z) - TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning [61.33599727106222]
TeLL-Drive is a hybrid framework that integrates a Teacher LLM to guide an attention-based Student DRL policy.<n>A self-attention mechanism then fuses these strategies with the DRL agent's exploration, accelerating policy convergence and boosting robustness.
arXiv Detail & Related papers (2025-02-03T14:22:03Z) - Research on vehicle detection based on improved YOLOv8 network [0.0]
This paper proposes an improved YOLOv8 vehicle detection method.<n>The improved model achieves 98.3%, 89.1% and 88.4% detection accuracy for car, Person and Motorcycle.
arXiv Detail & Related papers (2024-12-31T06:19:26Z) - Reinforcement Learning with Latent State Inference for Autonomous On-ramp Merging under Observation Delay [6.0111084468944]
We introduce the Lane-keeping, Lane-changing with Latent-state Inference and Safety Controller (L3IS) agent.
L3IS is designed to perform the on-ramp merging task safely without comprehensive knowledge about surrounding vehicles' intents or driving styles.
We present an augmentation of this agent called AL3IS that accounts for observation delays, allowing the agent to make more robust decisions in real-world environments.
arXiv Detail & Related papers (2024-03-18T15:02:46Z) - Unsupervised Domain Adaptation for Self-Driving from Past Traversal
Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments.
Our approach enhances LiDAR-based detection models using spatial quantized historical features.
Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z) - iPLAN: Intent-Aware Planning in Heterogeneous Traffic via Distributed
Multi-Agent Reinforcement Learning [57.24340061741223]
We introduce a distributed multi-agent reinforcement learning (MARL) algorithm that can predict trajectories and intents in dense and heterogeneous traffic scenarios.
Our approach for intent-aware planning, iPLAN, allows agents to infer nearby drivers' intents solely from their local observations.
arXiv Detail & Related papers (2023-06-09T20:12:02Z) - CLRerNet: Improving Confidence of Lane Detection with LaneIoU [3.2489082010225485]
We show that correct lane positions are already among the predictions of an existing row-based detector.
We propose LaneIoU that better correlates with the metric, by taking the local lane angles into consideration.
We develop a novel detector coined CLRerNet featuring LaneIoU for the target assignment cost and loss functions.
arXiv Detail & Related papers (2023-05-15T05:59:35Z) - Efficient and Robust LiDAR-Based End-to-End Navigation [132.52661670308606]
We present an efficient and robust LiDAR-based end-to-end navigation framework.
We propose Fast-LiDARNet that is based on sparse convolution kernel optimization and hardware-aware model design.
We then propose Hybrid Evidential Fusion that directly estimates the uncertainty of the prediction from only a single forward pass.
arXiv Detail & Related papers (2021-05-20T17:52:37Z) - End-to-End Intersection Handling using Multi-Agent Deep Reinforcement
Learning [63.56464608571663]
Navigating through intersections is one of the main challenging tasks for an autonomous vehicle.
In this work, we focus on the implementation of a system able to navigate through intersections where only traffic signs are provided.
We propose a multi-agent system using a continuous, model-free Deep Reinforcement Learning algorithm used to train a neural network for predicting both the acceleration and the steering angle at each time step.
arXiv Detail & Related papers (2021-04-28T07:54:40Z) - Detecting 32 Pedestrian Attributes for Autonomous Vehicles [103.87351701138554]
In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes.
We introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way.
We show competitive detection and attribute recognition results, as well as a more stable MTL training.
arXiv Detail & Related papers (2020-12-04T15:10:12Z) - Multi-lane Detection Using Instance Segmentation and Attentive Voting [0.0]
We propose a novel solution to multi-lane detection, which outperforms state of the art methods in terms of both accuracy and speed.
We are able to obtain a lane segmentation accuracy of 99.87% running at 54.53 fps (average)
arXiv Detail & Related papers (2020-01-01T16:48:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.