X-Driver: Explainable Autonomous Driving with Vision-Language Models
- URL: http://arxiv.org/abs/2505.05098v2
- Date: Tue, 03 Jun 2025 11:30:21 GMT
- Title: X-Driver: Explainable Autonomous Driving with Vision-Language Models
- Authors: Wei Liu, Jiyuan Zhang, Binxiong Zheng, Yufeng Hu, Yingzhan Lin, Zengfeng Zeng,
- Abstract summary: End-to-end autonomous driving has advanced significantly, offering benefits such as system simplicity and stronger driving performance.<n>Existing frameworks still suffer from low success rates in closed-loop evaluations, highlighting their limitations in real-world deployment.<n>We introduce X-Driver, a unified multi-modal large language models framework designed for closed-loop autonomous driving.
- Score: 6.053632514335829
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end autonomous driving has advanced significantly, offering benefits such as system simplicity and stronger driving performance in both open-loop and closed-loop settings than conventional pipelines. However, existing frameworks still suffer from low success rates in closed-loop evaluations, highlighting their limitations in real-world deployment. In this paper, we introduce X-Driver, a unified multi-modal large language models(MLLMs) framework designed for closed-loop autonomous driving, leveraging Chain-of-Thought(CoT) and autoregressive modeling to enhance perception and decision-making. We validate X-Driver across multiple autonomous driving tasks using public benchmarks in CARLA simulation environment, including Bench2Drive[6]. Our experimental results demonstrate superior closed-loop performance, surpassing the current state-of-the-art(SOTA) while improving the interpretability of driving decisions. These findings underscore the importance of structured reasoning in end-to-end driving and establish X-Driver as a strong baseline for future research in closed-loop autonomous driving.
Related papers
- LeAD: The LLM Enhanced Planning System Converged with End-to-end Autonomous Driving [48.607991747956255]
We present LeAD, a dual-rate autonomous driving architecture integrating imitation learning-based end-to-end (E2E) frameworks with large language model (LLM) augmentation.<n>Our experimental evaluation in the CARLA Simulator demonstrates LeAD's superior handling of unconventional scenarios, achieving 71 points on Leaderboard V1 benchmark, with a route completion of 93%.
arXiv Detail & Related papers (2025-07-08T07:58:29Z) - DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving [22.293019898794963]
We present AutoDriveRL, a unified training framework that formulates autonomous driving as a structured reasoning process over four core tasks.<n>Within this framework, we train DriveRX, a cross-task reasoning VLM designed for real-time decision-making.<n>Our analysis highlights the impact of vision encoder design and reward-guided reasoning compression.
arXiv Detail & Related papers (2025-05-27T03:21:04Z) - Hydra-NeXt: Robust Closed-Loop Driving with Open-Loop Training [64.16445087751039]
Hydra-NeXt is a novel multi-branch planning framework that unifies trajectory prediction, control prediction, and a trajectory refinement network in one model.<n> Hydra-NeXt surpasses the previous state-of-the-art by 22.98 DS and 17.49 SR, marking a significant advancement in autonomous driving.
arXiv Detail & Related papers (2025-03-15T07:42:27Z) - TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning [61.33599727106222]
TeLL-Drive is a hybrid framework that integrates a Teacher LLM to guide an attention-based Student DRL policy.<n>A self-attention mechanism then fuses these strategies with the DRL agent's exploration, accelerating policy convergence and boosting robustness.
arXiv Detail & Related papers (2025-02-03T14:22:03Z) - DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.<n>Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.<n>Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - Exploring the Causality of End-to-End Autonomous Driving [57.631400236930375]
We propose a comprehensive approach to explore and analyze the causality of end-to-end autonomous driving.
Our work is the first to unveil the mystery of end-to-end autonomous driving and turn the black box into a white one.
arXiv Detail & Related papers (2024-07-09T04:56:11Z) - DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving [81.04174379726251]
This paper collects a comprehensive end-to-end driving dataset named DriveCoT.
It contains sensor data, control decisions, and chain-of-thought labels to indicate the reasoning process.
We propose a baseline model called DriveCoT-Agent, trained on our dataset, to generate chain-of-thought predictions and final decisions.
arXiv Detail & Related papers (2024-03-25T17:59:01Z) - Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving [38.28159034562901]
Reason2Drive is a benchmark dataset with over 600K video-text pairs.
We characterize the autonomous driving process as a sequential combination of perception, prediction, and reasoning steps.
We introduce a novel aggregated evaluation metric to assess chain-based reasoning performance in autonomous systems.
arXiv Detail & Related papers (2023-12-06T18:32:33Z) - End-to-end Autonomous Driving: Challenges and Frontiers [45.391430626264764]
We provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving.
We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others.
We discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework.
arXiv Detail & Related papers (2023-06-29T14:17:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.