Related papers: DAP: A Discrete-token Autoregressive Planner for Autonomous Driving

DAP: A Discrete-token Autoregressive Planner for Autonomous Driving

URL: http://arxiv.org/abs/2511.13306v1
Date: Mon, 17 Nov 2025 12:31:33 GMT
Title: DAP: A Discrete-token Autoregressive Planner for Autonomous Driving
Authors: Bowen Ye, Bin Zhang, Hang Zhao,
Abstract summary: We introduce DAP, a discrete-token autoregressive planner that jointly forecasts BEV semantics and ego trajectories.<n>We incorporate a reinforcement-learning-based fine-tuning, which preserves supervised behavior cloning priors while injecting reward-guided improvements.<n>DAP achieves state-of-the-art performance on open-loop metrics and delivers competitive closed-loop results on the NAVSIM benchmark.
Score: 34.32497598431514
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Gaining sustainable performance improvement with scaling data and model budget remains a pivotal yet unresolved challenge in autonomous driving. While autoregressive models exhibited promising data-scaling efficiency in planning tasks, predicting ego trajectories alone suffers sparse supervision and weakly constrains how scene evolution should shape ego motion. Therefore, we introduce DAP, a discrete-token autoregressive planner that jointly forecasts BEV semantics and ego trajectories, thereby enforcing comprehensive representation learning and allowing predicted dynamics to directly condition ego motion. In addition, we incorporate a reinforcement-learning-based fine-tuning, which preserves supervised behavior cloning priors while injecting reward-guided improvements. Despite a compact 160M parameter budget, DAP achieves state-of-the-art performance on open-loop metrics and delivers competitive closed-loop results on the NAVSIM benchmark. Overall, the fully discrete-token autoregressive formulation operating on both rasterized BEV and ego actions provides a compact yet scalable planning paradigm for autonomous driving.

Related papers

Self-Correcting VLA: Online Action Refinement via Sparse World Imagination [55.982504915794514]
We propose Self-Correcting VLA (SC-VLA), which achieve self-improvement by intrinsically guiding action refinement through sparse imagination.<n>SC-VLA achieve state-of-the-art performance, yielding the highest task throughput with 16% fewer steps and a 9% higher success rate than the best-performing baselines.
arXiv Detail & Related papers (2026-02-25T06:58:06Z)
Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation [12.450883696383878]
Imitation learning (IL) has emerged as a central paradigm in autonomous driving.<n>IL excels in matching expert behavior in open-loop settings by minimizing per-step prediction errors.<n>Over successive planning cycles, small, often imperceptible errors compound, potentially resulting in severe failures.<n>We propose Sequence of Experts (SoE) to enhance closed-loop performance without increasing model size or data requirements.
arXiv Detail & Related papers (2025-12-15T08:50:23Z)
AutoDrive-R$^2$: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving [37.260140808367716]
We propose AutoDrive-R$2$, a novel VLA framework that enhances both reasoning and self-reflection capabilities of autonomous driving systems.<n>We first propose an innovative CoT dataset named nuScenesR$2$-6K for supervised fine-tuning.<n>We then employ the Group Relative Policy Optimization (GRPO) algorithm within a physics-grounded reward framework to ensure reliable smoothness and realistic trajectory planning.
arXiv Detail & Related papers (2025-09-02T04:32:24Z)
ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving [64.12414815634847]
Vision-Language Models (VLMs) and Driving World Models (DWMs) have independently emerged as powerful recipes addressing different aspects of this challenge.<n>We propose ImagiDrive, a novel end-to-end autonomous driving framework that integrates a VLM-based driving agent with a DWM-based scene imaginer.
arXiv Detail & Related papers (2025-08-15T12:06:55Z)
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning [37.176428069948535]
Vision-Language-Action (VLA) models have shown promise for end-to-end autonomous driving.<n>Current VLA models struggle with physically infeasible action outputs, complex model structures, or unnecessarily long reasoning.<n>We propose AutoVLA, a novel VLA model that unifies reasoning and action generation within a single autoregressive generation model.
arXiv Detail & Related papers (2025-06-16T17:58:50Z)
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving [49.07731497951963]
ReCogDrive is a novel Reinforced Cognitive framework for end-to-end autonomous driving.<n>We introduce a hierarchical data pipeline that mimics the sequential cognitive process of human drivers.<n>We then address the language-action mismatch by injecting the VLM's learned driving priors into a diffusion planner.
arXiv Detail & Related papers (2025-06-09T03:14:04Z)
Predictive Planner for Autonomous Driving with Consistency Models [5.966385886363771]
Trajectory prediction and planning are essential for autonomous vehicles to navigate safely and efficiently in dynamic environments.<n>Recent diffusion-based generative models have shown promise in multi-agent trajectory generation, but their slow sampling is less suitable for high-frequency planning tasks.<n>We leverage the consistency model to build a predictive planner that samples from a joint distribution of ego and surrounding agents, conditioned on the ego vehicle's navigational goal.
arXiv Detail & Related papers (2025-02-12T00:26:01Z)
DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.<n>Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.<n>Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z)
Planning with Adaptive World Models for Autonomous Driving [50.4439896514353]
We present nuPlan, a real-world motion planning benchmark that captures multi-agent interactions.<n>We learn to model such unique behaviors with BehaviorNet, a graph convolutional neural network (GCNN)<n>We also present AdaptiveDriver, a model-predictive control (MPC) based planner that unrolls different world models conditioned on BehaviorNet's predictions.
arXiv Detail & Related papers (2024-06-15T18:53:45Z)
PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving [57.89801036693292]
PPAD (Iterative Interaction of Prediction and Planning Autonomous Driving) considers the timestep-wise interaction to better integrate prediction and planning. We design ego-to-agent, ego-to-map, and ego-to-BEV interaction mechanisms with hierarchical dynamic key objects attention to better model the interactions.
arXiv Detail & Related papers (2023-11-14T11:53:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.