LAD-Drive: Bridging Language and Trajectory with Action-Aware Diffusion Transformers
- URL: http://arxiv.org/abs/2603.02035v1
- Date: Mon, 02 Mar 2026 16:21:42 GMT
- Title: LAD-Drive: Bridging Language and Trajectory with Action-Aware Diffusion Transformers
- Authors: Fabian Schmidt, Karol Fedurko, Markus Enzweiler, Abhinav Valada,
- Abstract summary: We introduce LAD-Drive, a generative framework that disentangles high-level intention from low-level spatial planning.<n>LAD-Drive employs an action decoder to infer a probabilistic meta-action distribution, establishing an explicit belief state that preserves the nuanced intent typically lost by one-hot encodings.<n>Extensive evaluations on the LangAuto benchmark demonstrate that LAD-Drive achieves state-of-the-art results, outperforming competitive baselines by up to 59% in Driving Score.
- Score: 15.4994260281059
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While multimodal large language models (MLLMs) provide advanced reasoning for autonomous driving, translating their discrete semantic knowledge into continuous trajectories remains a fundamental challenge. Existing methods often rely on unimodal planning heads that inherently limit their ability to represent multimodal driving behavior. Furthermore, most generative approaches frequently condition on one-hot encoded actions, discarding the nuanced navigational uncertainty critical for complex scenarios. To resolve these limitations, we introduce LAD-Drive, a generative framework that structurally disentangles high-level intention from low-level spatial planning. LAD-Drive employs an action decoder to infer a probabilistic meta-action distribution, establishing an explicit belief state that preserves the nuanced intent typically lost by one-hot encodings. This distribution, fused with the vehicle's kinematic state, conditions an action-aware diffusion decoder that utilizes a truncated denoising process to refine learned motion anchors into safe, kinematically feasible trajectories. Extensive evaluations on the LangAuto benchmark demonstrate that LAD-Drive achieves state-of-the-art results, outperforming competitive baselines by up to 59% in Driving Score while significantly reducing route deviations and collisions. We will publicly release the code and models on https://github.com/iis-esslingen/lad-drive.
Related papers
- Unifying Language-Action Understanding and Generation for Autonomous Driving [25.23561391638388]
Vision-Language-Action (VLA) models are emerging as a promising paradigm for end-to-end autonomous driving.<n>Existing methods suffer from two critical limitations: a persistent misalignment between language instructions and action outputs, and the inherent inefficiency of typical auto-regressive action generation.<n>We introduce LinkVLA, a novel architecture that directly addresses these challenges to enhance both alignment and efficiency.
arXiv Detail & Related papers (2026-03-02T04:41:10Z) - SGDrive: Scene-to-Goal Hierarchical World Cognition for Autonomous Driving [52.02379432801349]
We propose SGDrive, a novel framework that structures the VLM's representation learning around driving-specific knowledge hierarchies.<n>Built upon a pre-trained VLM backbone, SGDrive decomposes driving understanding into a scene-agent-goal hierarchy that mirrors human driving cognition.
arXiv Detail & Related papers (2026-01-09T08:55:42Z) - MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning [51.20229133553804]
Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL)<n>Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning.<n>We propose MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters.<n>By feeding trajectory-level rewards back into the reasoning space, MindDrive enables trial-and-error learning over a finite set of discrete linguistic driving decisions.
arXiv Detail & Related papers (2025-12-15T18:31:32Z) - AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving [71.55254573283793]
Existing approaches either activate Large Language Models too frequently, causing excessive computational overhead, or use fixed schedules.<n>We propose AdaDrive, an adaptively collaborative slow-fast framework that optimally determines when and how LLMs contribute to decision-making.<n>AdaDrive provides a flexible, context-aware framework that maximizes decision accuracy without compromising real-time performance.
arXiv Detail & Related papers (2025-11-09T07:05:03Z) - BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving [29.832781649644414]
BridgeDrive is a novel anchor-guided diffusion bridge policy for closed-loop trajectory planning.<n>We achieve state-of-the-art performance on the Bench2Drive benchmark, improving the success rate by 5% over prior arts.
arXiv Detail & Related papers (2025-09-28T02:47:12Z) - Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving [55.13109926181247]
We introduce ReflectDrive, a learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion.<n>Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient.<n>Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors.
arXiv Detail & Related papers (2025-09-24T13:35:15Z) - ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving [64.12414815634847]
Vision-Language Models (VLMs) and Driving World Models (DWMs) have independently emerged as powerful recipes addressing different aspects of this challenge.<n>We propose ImagiDrive, a novel end-to-end autonomous driving framework that integrates a VLM-based driving agent with a DWM-based scene imaginer.
arXiv Detail & Related papers (2025-08-15T12:06:55Z) - ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving [49.07731497951963]
ReCogDrive is a novel Reinforced Cognitive framework for end-to-end autonomous driving.<n>We introduce a hierarchical data pipeline that mimics the sequential cognitive process of human drivers.<n>We then address the language-action mismatch by injecting the VLM's learned driving priors into a diffusion planner.
arXiv Detail & Related papers (2025-06-09T03:14:04Z) - DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving [38.867860153968394]
Diffusion model has emerged as a powerful generative technique for robotic policy learning.<n>We propose a novel truncated diffusion policy that incorporates prior multi-mode anchors and truncates the diffusion schedule.<n>The proposed model, DiffusionDrive, demonstrates 10$times$ reduction in denoising steps compared to vanilla diffusion policy.
arXiv Detail & Related papers (2024-11-22T18:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.