AdaThinkDrive: Adaptive Thinking via Reinforcement Learning for Autonomous Driving
- URL: http://arxiv.org/abs/2509.13769v1
- Date: Wed, 17 Sep 2025 07:35:39 GMT
- Title: AdaThinkDrive: Adaptive Thinking via Reinforcement Learning for Autonomous Driving
- Authors: Yuechen Luo, Fang Li, Shaoqing Xu, Zhiyi Lai, Lei Yang, Qimao Chen, Ziang Luo, Zixun Xie, Shengyin Jiang, Jiaxin Liu, Long Chen, Bing Wang, Zhi-xin Yang,
- Abstract summary: Chain of Thought (CoT) has been widely adopted in Vision Language Action (VLA) models.<n>We propose AdaThinkDrive, a novel VLA framework with a dual mode reasoning mechanism inspired by fast and slow thinking.
- Score: 21.10362636088305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While reasoning technology like Chain of Thought (CoT) has been widely adopted in Vision Language Action (VLA) models, it demonstrates promising capabilities in end to end autonomous driving. However, recent efforts to integrate CoT reasoning often fall short in simple scenarios, introducing unnecessary computational overhead without improving decision quality. To address this, we propose AdaThinkDrive, a novel VLA framework with a dual mode reasoning mechanism inspired by fast and slow thinking. First, our framework is pretrained on large scale autonomous driving (AD) scenarios using both question answering (QA) and trajectory datasets to acquire world knowledge and driving commonsense. During supervised fine tuning (SFT), we introduce a two mode dataset, fast answering (w/o CoT) and slow thinking (with CoT), enabling the model to distinguish between scenarios that require reasoning. Furthermore, an Adaptive Think Reward strategy is proposed in conjunction with the Group Relative Policy Optimization (GRPO), which rewards the model for selectively applying CoT by comparing trajectory quality across different reasoning modes. Extensive experiments on the Navsim benchmark show that AdaThinkDrive achieves a PDMS of 90.3, surpassing the best vision only baseline by 1.7 points. Moreover, ablations show that AdaThinkDrive surpasses both the never Think and always Think baselines, improving PDMS by 2.0 and 1.4, respectively. It also reduces inference time by 14% compared to the always Think baseline, demonstrating its ability to balance accuracy and efficiency through adaptive reasoning.
Related papers
- ThinkDrive: Chain-of-Thought Guided Progressive Reinforcement Learning Fine-Tuning for Autonomous Driving [14.981675960513606]
Existing methods suffer from unstructured reasoning, poor generalization, and misalignment with human driving.<n>We propose ThinkDrive, a CoT guided progressive RL fine-tuning framework for autonomous driving that synergizes explicit reasoning with difficulty-aware adaptive policy optimization.<n>The results show that ThinkDrive outperforms strong RL baselines by 1.45%, 1.95%, and 1.01% on exam, easy-exam intensity and accuracy, respectively.
arXiv Detail & Related papers (2026-01-08T08:30:36Z) - OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving [11.793091014248025]
We introduce OmniDrive-R1, an end-to-end VLM framework designed for autonomous driving.<n>Our core innovation is an Reinforcement-driven visual grounding capability.<n>This capability is enabled by our pure two-stage reinforcement learning training pipeline and Clip-GRPO algorithm.
arXiv Detail & Related papers (2025-12-16T03:19:28Z) - MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning [51.20229133553804]
Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL)<n>Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning.<n>We propose MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters.<n>By feeding trajectory-level rewards back into the reasoning space, MindDrive enables trial-and-error learning over a finite set of discrete linguistic driving decisions.
arXiv Detail & Related papers (2025-12-15T18:31:32Z) - Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning [57.96134674544638]
We propose a novel adaptive reasoning framework that dynamically adjusts the model's reasoning depth according to task difficulty.<n>Our framework comprises two stages: (1) an Adaptive Supervised Fine-Tuning stage, which endows the Omni model with fundamental reasoning capability using large-scale reasoning-augmented data, and (2) an Adaptive Reinforcement Learning stage, which optimize reasoning behaviors based on task complexity and reward feedback.
arXiv Detail & Related papers (2025-12-03T13:33:28Z) - AutoDrive-R$^2$: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving [37.260140808367716]
We propose AutoDrive-R$2$, a novel VLA framework that enhances both reasoning and self-reflection capabilities of autonomous driving systems.<n>We first propose an innovative CoT dataset named nuScenesR$2$-6K for supervised fine-tuning.<n>We then employ the Group Relative Policy Optimization (GRPO) algorithm within a physics-grounded reward framework to ensure reliable smoothness and realistic trajectory planning.
arXiv Detail & Related papers (2025-09-02T04:32:24Z) - ViLaD: A Large Vision Language Diffusion Framework for End-to-End Autonomous Driving [14.486548540613791]
We introduce ViLaD, a novel Large Vision Language Diffusion framework for end-to-end autonomous driving.<n>ViLaD enables parallel generation of entire driving decision sequences, significantly reducing computational latency.<n>We conduct comprehensive experiments on the nuScenes dataset, where ViLaD outperforms state-of-the-art autoregressive VLM baselines in both planning accuracy and inference speed.
arXiv Detail & Related papers (2025-08-18T04:01:56Z) - ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving [64.12414815634847]
Vision-Language Models (VLMs) and Driving World Models (DWMs) have independently emerged as powerful recipes addressing different aspects of this challenge.<n>We propose ImagiDrive, a novel end-to-end autonomous driving framework that integrates a VLM-based driving agent with a DWM-based scene imaginer.
arXiv Detail & Related papers (2025-08-15T12:06:55Z) - DriveAgent-R1: Advancing VLM-based Autonomous Driving with Active Perception and Hybrid Thinking [33.98300989562812]
We introduce DriveAgent-R1, the first autonomous driving agent capable of active perception for planning.<n>In complex scenarios, DriveAgent-R1 proactively invokes tools to perform visual reasoning, firmly grounding its decisions in visual evidence.<n>We propose a hybrid thinking framework, inspired by human driver cognitive patterns, allowing the agent to adaptively switch between efficient text-only reasoning and robust tool-augmented visual reasoning.
arXiv Detail & Related papers (2025-07-28T14:33:15Z) - AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning [42.409352964719204]
Vision-Language-Action (VLA) models have shown promise for end-to-end autonomous driving.<n>Current VLA models struggle with physically infeasible action outputs, complex model structures, or unnecessarily long reasoning.<n>We propose AutoVLA, a novel VLA model that unifies reasoning and action generation within a single autoregressive generation model.
arXiv Detail & Related papers (2025-06-16T17:58:50Z) - ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving [49.07731497951963]
ReCogDrive is a novel Reinforced Cognitive framework for end-to-end autonomous driving.<n>We introduce a hierarchical data pipeline that mimics the sequential cognitive process of human drivers.<n>We then address the language-action mismatch by injecting the VLM's learned driving priors into a diffusion planner.
arXiv Detail & Related papers (2025-06-09T03:14:04Z) - DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving [22.293019898794963]
We present AutoDriveRL, a unified training framework that formulates autonomous driving as a structured reasoning process over four core tasks.<n>Within this framework, we train DriveRX, a cross-task reasoning VLM designed for real-time decision-making.<n>Our analysis highlights the impact of vision encoder design and reward-guided reasoning compression.
arXiv Detail & Related papers (2025-05-27T03:21:04Z) - SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving [51.47621083057114]
SOLVE is an innovative framework that synergizes Vision-Language Models with end-to-end (E2E) models to enhance autonomous vehicle planning.<n>Our approach emphasizes knowledge sharing at the feature level through a shared visual encoder, enabling comprehensive interaction between VLM and E2E components.
arXiv Detail & Related papers (2025-05-22T15:44:30Z) - AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving [28.378854340190973]
Vision-Language Models (VLMs) show promise for autonomous driving, yet their struggle with hallucinations, inefficient reasoning, and limited real-world validation hinders accurate perception and robust step-by-step reasoning.<n>We introduce AgentThink, a pioneering unified framework that integrates Chain-of-Thought (CoT) reasoning with dynamic, agent-style tool invocation for autonomous driving tasks.
arXiv Detail & Related papers (2025-05-21T09:27:43Z) - VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning [55.97950660659051]
We aim to enhance the slow-thinking capabilities of vision-language models using reinforcement learning (without relying on distillation)<n>We introduce Forced Rethinking, which appends a rethinking trigger token to the end of rollouts in RL training, explicitly enforcing a self-reflection reasoning step.<n>Our model, VL-Rethinker, advances state-of-the-art scores on MathVista, MathVerse to achieve 80.4%, 63.5% respectively.
arXiv Detail & Related papers (2025-04-10T17:41:56Z) - DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding [76.3876070043663]
We propose DriveLMM-o1, a dataset and benchmark designed to advance step-wise visual reasoning for autonomous driving.<n>Our benchmark features over 18k VQA examples in the training set and more than 4k in the test set, covering diverse questions on perception, prediction, and planning.<n>Our model achieves a +7.49% gain in final answer accuracy, along with a 3.62% improvement in reasoning score over the previous best open-source model.
arXiv Detail & Related papers (2025-03-13T17:59:01Z) - Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction [71.97877759413272]
Trajectory prediction is a safety-critical tool for autonomous vehicles to plan and execute actions.
Recent methods have achieved strong performances using Multi-Choice Learning objectives like winner-takes-all (WTA) or best-of-many.
Our work addresses two key challenges in trajectory prediction, learning outputs, and better predictions by imposing constraints using driving knowledge.
arXiv Detail & Related papers (2021-04-16T17:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.