LeAD: The LLM Enhanced Planning System Converged with End-to-end Autonomous Driving
- URL: http://arxiv.org/abs/2507.05754v1
- Date: Tue, 08 Jul 2025 07:58:29 GMT
- Title: LeAD: The LLM Enhanced Planning System Converged with End-to-end Autonomous Driving
- Authors: Yuhang Zhang, Jiaqi Liu, Chengkai Xu, Peng Hang, Jian Sun,
- Abstract summary: We present LeAD, a dual-rate autonomous driving architecture integrating imitation learning-based end-to-end (E2E) frameworks with large language model (LLM) augmentation.<n>Our experimental evaluation in the CARLA Simulator demonstrates LeAD's superior handling of unconventional scenarios, achieving 71 points on Leaderboard V1 benchmark, with a route completion of 93%.
- Score: 48.607991747956255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A principal barrier to large-scale deployment of urban autonomous driving systems lies in the prevalence of complex scenarios and edge cases. Existing systems fail to effectively interpret semantic information within traffic contexts and discern intentions of other participants, consequently generating decisions misaligned with skilled drivers' reasoning patterns. We present LeAD, a dual-rate autonomous driving architecture integrating imitation learning-based end-to-end (E2E) frameworks with large language model (LLM) augmentation. The high-frequency E2E subsystem maintains real-time perception-planning-control cycles, while the low-frequency LLM module enhances scenario comprehension through multi-modal perception fusion with HD maps and derives optimal decisions via chain-of-thought (CoT) reasoning when baseline planners encounter capability limitations. Our experimental evaluation in the CARLA Simulator demonstrates LeAD's superior handling of unconventional scenarios, achieving 71 points on Leaderboard V1 benchmark, with a route completion of 93%.
Related papers
- VLAD: A VLM-Augmented Autonomous Driving Framework with Hierarchical Planning and Interpretable Decision Process [40.3578745624081]
We propose a vision-language autonomous driving model, which integrates a fine-tuned Visual Language Models (VLMs) with a state-of-the-art end-to-end system.<n>We implement a specialized fine-tuning approach using custom question-answer datasets designed specifically to improve the spatial reasoning capabilities of the model.<n>Our system produces interpretable natural language explanations of driving decisions, thereby increasing transparency and trustworthiness of the traditionally black-box end-to-end architecture.
arXiv Detail & Related papers (2025-07-02T01:52:40Z) - ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving [35.493857028919685]
We propose ReCogDrive, an autonomous driving system that integrates Vision-Language Models with diffusion planner.<n>In this paper, we use a large-scale driving question-answering datasets to train the VLMs, mitigating the domain discrepancy between generic content and real-world driving scenarios.<n>In the second stage, we employ a diffusion-based planner to perform imitation learning, mapping representations from the latent language space to continuous driving actions.
arXiv Detail & Related papers (2025-06-09T03:14:04Z) - SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving [51.47621083057114]
SOLVE is an innovative framework that synergizes Vision-Language Models with end-to-end (E2E) models to enhance autonomous vehicle planning.<n>Our approach emphasizes knowledge sharing at the feature level through a shared visual encoder, enabling comprehensive interaction between VLM and E2E components.
arXiv Detail & Related papers (2025-05-22T15:44:30Z) - X-Driver: Explainable Autonomous Driving with Vision-Language Models [6.053632514335829]
End-to-end autonomous driving has advanced significantly, offering benefits such as system simplicity and stronger driving performance.<n>Existing frameworks still suffer from low success rates in closed-loop evaluations, highlighting their limitations in real-world deployment.<n>We introduce X-Driver, a unified multi-modal large language models framework designed for closed-loop autonomous driving.
arXiv Detail & Related papers (2025-05-08T09:52:55Z) - Distilling Multi-modal Large Language Models for Autonomous Driving [64.63127269187814]
Recent end-to-end autonomous driving systems leverage large language models (LLMs) as planners to improve generalizability to rare events.<n>We propose DiMA, an end-to-end autonomous driving system that maintains the efficiency of an LLM-free (or vision-based) planner while leveraging the world knowledge of an LLM.<n>Training with DiMA results in a 37% reduction in the L2 trajectory error and an 80% reduction in the collision rate of the vision-based planner, as well as a 44% trajectory error reduction in longtail scenarios.
arXiv Detail & Related papers (2025-01-16T18:59:53Z) - DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.<n>Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.<n>Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - Making Large Language Models Better Planners with Reasoning-Decision Alignment [70.5381163219608]
We motivate an end-to-end decision-making model based on multimodality-augmented LLM.
We propose a reasoning-decision alignment constraint between the paired CoTs and planning results.
We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver.
arXiv Detail & Related papers (2024-08-25T16:43:47Z) - Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving [38.28159034562901]
Reason2Drive is a benchmark dataset with over 600K video-text pairs.
We characterize the autonomous driving process as a sequential combination of perception, prediction, and reasoning steps.
We introduce a novel aggregated evaluation metric to assess chain-based reasoning performance in autonomous systems.
arXiv Detail & Related papers (2023-12-06T18:32:33Z) - Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems.
LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning.
We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.