Perception in Plan: Coupled Perception and Planning for End-to-End Autonomous Driving
- URL: http://arxiv.org/abs/2508.11488v1
- Date: Fri, 15 Aug 2025 14:05:57 GMT
- Title: Perception in Plan: Coupled Perception and Planning for End-to-End Autonomous Driving
- Authors: Bozhou Zhang, Jingyu Li, Nan Song, Li Zhang,
- Abstract summary: VeteranAD is a coupled perception and planning framework for end-to-end autonomous driving.<n>We introduce a perception-in-plan framework design, which integrates perception into the planning process.<n>VeteranAD fully unleashes the potential of planning-oriented end-to-end methods, leading to more accurate and reliable driving behavior.
- Score: 13.367058484125787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end autonomous driving has achieved remarkable advancements in recent years. Existing methods primarily follow a perception-planning paradigm, where perception and planning are executed sequentially within a fully differentiable framework for planning-oriented optimization. We further advance this paradigm through a perception-in-plan framework design, which integrates perception into the planning process. This design facilitates targeted perception guided by evolving planning objectives over time, ultimately enhancing planning performance. Building on this insight, we introduce VeteranAD, a coupled perception and planning framework for end-to-end autonomous driving. By incorporating multi-mode anchored trajectories as planning priors, the perception module is specifically designed to gather traffic elements along these trajectories, enabling comprehensive and targeted perception. Planning trajectories are then generated based on both the perception results and the planning priors. To make perception fully serve planning, we adopt an autoregressive strategy that progressively predicts future trajectories while focusing on relevant regions for targeted perception at each step. With this simple yet effective design, VeteranAD fully unleashes the potential of planning-oriented end-to-end methods, leading to more accurate and reliable driving behavior. Extensive experiments on the NAVSIM and Bench2Drive datasets demonstrate that our VeteranAD achieves state-of-the-art performance.
Related papers
- Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling [75.83583076519311]
Plan-R1 is a novel two-stage trajectory planning framework that formulates trajectory planning as a sequential prediction task.<n>In the first stage, we train an autoregressive trajectory predictor via next motion token prediction on expert data.<n>In the second stage, we design rule-based rewards (e.g., collision avoidance, speed limits) and fine-tune the model using Group Relative Policy Optimization.
arXiv Detail & Related papers (2025-05-23T09:22:19Z) - HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder [3.0989923815412204]
We propose a novel end-to-end autonomous driving framework, termed HiP-AD.<n>HiP-AD simultaneously performs perception, prediction, and planning within a unified decoder.<n>Experiments demonstrate that HiP-AD outperforms all existing end-to-end autonomous driving methods on the closed-loop benchmark Bench2Drive.
arXiv Detail & Related papers (2025-03-11T16:52:45Z) - DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.<n>Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.<n>Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation [11.011219709863875]
We propose a new end-to-end autonomous driving paradigm named SparseDrive.
SparseDrive consists of a symmetric sparse perception module and a parallel motion planner.
For motion prediction and planning, we review the great similarity between these two tasks, leading to a parallel design for motion planner.
arXiv Detail & Related papers (2024-05-30T02:13:56Z) - Path Planning based on 2D Object Bounding-box [8.082514573754954]
We present a path planning method that utilizes 2D bounding boxes of objects, developed through imitation learning in urban driving scenarios.
This is achieved by integrating high-definition (HD) map data with images captured by surrounding cameras.
We evaluate our model on the nuPlan planning task and observed that it performs competitively in comparison to existing vision-centric methods.
arXiv Detail & Related papers (2024-02-22T19:34:56Z) - PlanT: Explainable Planning Transformers via Object-Level
Representations [64.93938686101309]
PlanT is a novel approach for planning in the context of self-driving.
PlanT is based on imitation learning with a compact object-level input representation.
Our results indicate that PlanT can focus on the most relevant object in the scene, even when this object is geometrically distant.
arXiv Detail & Related papers (2022-10-25T17:59:46Z) - Differentiable Spatial Planning using Transformers [87.90709874369192]
We propose Spatial Planning Transformers (SPT), which given an obstacle map learns to generate actions by planning over long-range spatial dependencies.
In the setting where the ground truth map is not known to the agent, we leverage pre-trained SPTs in an end-to-end framework.
SPTs outperform prior state-of-the-art differentiable planners across all the setups for both manipulation and navigation tasks.
arXiv Detail & Related papers (2021-12-02T06:48:16Z) - Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable
Semantic Representations [81.05412704590707]
We propose a novel end-to-end learnable network that performs joint perception, prediction and motion planning for self-driving vehicles.
Our network is learned end-to-end from human demonstrations.
arXiv Detail & Related papers (2020-08-13T14:40:46Z) - Long-Horizon Visual Planning with Goal-Conditioned Hierarchical
Predictors [124.30562402952319]
The ability to predict and plan into the future is fundamental for agents acting in the world.
Current learning approaches for visual prediction and planning fail on long-horizon tasks.
We propose a framework for visual prediction and planning that is able to overcome both of these limitations.
arXiv Detail & Related papers (2020-06-23T17:58:56Z) - The Importance of Prior Knowledge in Precise Multimodal Prediction [71.74884391209955]
Roads have well defined geometries, topologies, and traffic rules.
In this paper we propose to incorporate structured priors as a loss function.
We demonstrate the effectiveness of our approach on real-world self-driving datasets.
arXiv Detail & Related papers (2020-06-04T03:56:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.