Related papers: ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving

ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving

URL: http://arxiv.org/abs/2505.20024v1
Date: Mon, 26 May 2025 14:12:38 GMT
Title: ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving
Authors: Xueyi Liu, Zuodong Zhong, Yuxin Guo, Yun-Fu Liu, Zhiguo Su, Qichao Zhang, Junli Wang, Yinfeng Gao, Yupeng Zheng, Qiao Lin, Huiyong Chen, Dongbin Zhao,
Abstract summary: multimodal large language models (MLLMs) have garnered significant attention in the field of end-to-end (E2E) autonomous driving.<n>We propose ReasonPlan, a novel MLLM fine-tuning framework designed for closed-loop driving through holistic reasoning.<n>Our method outperforms the mainstream E2E imitation learning method by a large margin of 19% L2 and 16.1 driving score on Bench2Drive benchmark.
Score: 12.035324146676555
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Due to the powerful vision-language reasoning and generalization abilities, multimodal large language models (MLLMs) have garnered significant attention in the field of end-to-end (E2E) autonomous driving. However, their application to closed-loop systems remains underexplored, and current MLLM-based methods have not shown clear superiority to mainstream E2E imitation learning approaches. In this work, we propose ReasonPlan, a novel MLLM fine-tuning framework designed for closed-loop driving through holistic reasoning with a self-supervised Next Scene Prediction task and supervised Decision Chain-of-Thought process. This dual mechanism encourages the model to align visual representations with actionable driving context, while promoting interpretable and causally grounded decision making. We curate a planning-oriented decision reasoning dataset, namely PDR, comprising 210k diverse and high-quality samples. Our method outperforms the mainstream E2E imitation learning method by a large margin of 19% L2 and 16.1 driving score on Bench2Drive benchmark. Furthermore, ReasonPlan demonstrates strong zero-shot generalization on unseen DOS benchmark, highlighting its adaptability in handling zero-shot corner cases. Code and dataset will be found in https://github.com/Liuxueyi/ReasonPlan.

Related papers

LeAD: The LLM Enhanced Planning System Converged with End-to-end Autonomous Driving [48.607991747956255]
We present LeAD, a dual-rate autonomous driving architecture integrating imitation learning-based end-to-end (E2E) frameworks with large language model (LLM) augmentation.<n>Our experimental evaluation in the CARLA Simulator demonstrates LeAD's superior handling of unconventional scenarios, achieving 71 points on Leaderboard V1 benchmark, with a route completion of 93%.
arXiv Detail & Related papers (2025-07-08T07:58:29Z)
Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought [58.321044666612174]
Vad-R1 is an end-to-end MLLM-based framework for Video Anomaly Reasoning.<n>We design a Perception-to-Cognition Chain-of-Thought (P2C-CoT) that simulates the human process of recognizing anomalies.<n>We also propose an improved reinforcement learning algorithm AVA-GRPO, which explicitly incentivizes the anomaly reasoning capability of MLLMs.
arXiv Detail & Related papers (2025-05-26T12:05:16Z)
ReasonDrive: Efficient Visual Question Answering for Autonomous Vehicles with Reasoning-Enhanced Small Vision-Language Models [9.316712964093506]
Vision-language models (VLMs) show promise for autonomous driving but often lack transparent reasoning capabilities that are critical for safety.<n>We investigate whether explicitly modeling reasoning during fine-tuning enhances VLM performance on driving decision tasks.
arXiv Detail & Related papers (2025-04-14T23:16:07Z)
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation [44.16465715911478]
We propose ORION, a holistic E2E autonomous driving framework by vision-language instructed action generation.<n>Our method achieves an impressive closed-loop performance of 77.74 Driving Score (DS) and 54.62% Success Rate (SR) on the challenge Bench2Drive datasets.
arXiv Detail & Related papers (2025-03-25T15:18:43Z)
DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding [76.3876070043663]
We propose DriveLMM-o1, a dataset and benchmark designed to advance step-wise visual reasoning for autonomous driving.<n>Our benchmark features over 18k VQA examples in the training set and more than 4k in the test set, covering diverse questions on perception, prediction, and planning.<n>Our model achieves a +7.49% gain in final answer accuracy, along with a 3.62% improvement in reasoning score over the previous best open-source model.
arXiv Detail & Related papers (2025-03-13T17:59:01Z)
DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving [1.8434042562191815]
We present a novel autonomous driving framework, DualAD, designed to imitate human reasoning during driving.<n>DualAD comprises two layers: a rule-based motion planner at the bottom layer that handles routine driving tasks requiring minimal reasoning, and an upper layer featuring a rule-based text encoder.
arXiv Detail & Related papers (2024-09-26T16:58:04Z)
DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.<n>Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.<n>Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z)
Making Large Language Models Better Planners with Reasoning-Decision Alignment [70.5381163219608]
We motivate an end-to-end decision-making model based on multimodality-augmented LLM. We propose a reasoning-decision alignment constraint between the paired CoTs and planning results. We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver.
arXiv Detail & Related papers (2024-08-25T16:43:47Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving [69.82743399946371]
DriveMLM is a framework that can perform close-loop autonomous driving in realistic simulators. We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system. This model can plug-and-play in existing AD systems such as Apollo for close-loop driving.
arXiv Detail & Related papers (2023-12-14T18:59:05Z)
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving [38.28159034562901]
Reason2Drive is a benchmark dataset with over 600K video-text pairs. We characterize the autonomous driving process as a sequential combination of perception, prediction, and reasoning steps. We introduce a novel aggregated evaluation metric to assess chain-based reasoning performance in autonomous systems.
arXiv Detail & Related papers (2023-12-06T18:32:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.