HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder
- URL: http://arxiv.org/abs/2503.08612v1
- Date: Tue, 11 Mar 2025 16:52:45 GMT
- Title: HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder
- Authors: Yingqi Tang, Zhuoran Xu, Zhaotie Meng, Erkang Cheng,
- Abstract summary: We propose a novel end-to-end autonomous driving framework, termed HiP-AD.<n>HiP-AD simultaneously performs perception, prediction, and planning within a unified decoder.<n>Experiments demonstrate that HiP-AD outperforms all existing end-to-end autonomous driving methods on the closed-loop benchmark Bench2Drive.
- Score: 3.0989923815412204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although end-to-end autonomous driving (E2E-AD) technologies have made significant progress in recent years, there remains an unsatisfactory performance on closed-loop evaluation. The potential of leveraging planning in query design and interaction has not yet been fully explored. In this paper, we introduce a multi-granularity planning query representation that integrates heterogeneous waypoints, including spatial, temporal, and driving-style waypoints across various sampling patterns. It provides additional supervision for trajectory prediction, enhancing precise closed-loop control for the ego vehicle. Additionally, we explicitly utilize the geometric properties of planning trajectories to effectively retrieve relevant image features based on physical locations using deformable attention. By combining these strategies, we propose a novel end-to-end autonomous driving framework, termed HiP-AD, which simultaneously performs perception, prediction, and planning within a unified decoder. HiP-AD enables comprehensive interaction by allowing planning queries to iteratively interact with perception queries in the BEV space while dynamically extracting image features from perspective views. Experiments demonstrate that HiP-AD outperforms all existing end-to-end autonomous driving methods on the closed-loop benchmark Bench2Drive and achieves competitive performance on the real-world dataset nuScenes.
Related papers
- DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving [62.62464518137153]
DriveTransformer is a simplified E2E-AD framework for the ease of scaling up.
It is composed of three unified operations: task self-attention, sensor cross-attention, temporal cross-attention.
It achieves state-of-the-art performance in both simulated closed-loop benchmark Bench2Drive and real world open-loop benchmark nuScenes with high FPS.
arXiv Detail & Related papers (2025-03-07T11:41:18Z) - End-to-End Predictive Planner for Autonomous Driving with Consistency Models [5.966385886363771]
Trajectory prediction and planning are fundamental components for autonomous vehicles to navigate safely and efficiently in dynamic environments.
Traditionally, these components have often been treated as separate modules, limiting the ability to perform interactive planning.
We present a novel unified and data-driven framework that integrates prediction and planning with a single consistency model.
arXiv Detail & Related papers (2025-02-12T00:26:01Z) - DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.<n>Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.<n>Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - Planning with Adaptive World Models for Autonomous Driving [50.4439896514353]
We present nuPlan, a real-world motion planning benchmark that captures multi-agent interactions.
We learn to model such unique behaviors with BehaviorNet, a graph convolutional neural network (GCNN)
We also present AdaptiveDriver, a model-predictive control (MPC) based planner that unrolls different world models conditioned on BehaviorNet's predictions.
arXiv Detail & Related papers (2024-06-15T18:53:45Z) - SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation [11.011219709863875]
We propose a new end-to-end autonomous driving paradigm named SparseDrive.
SparseDrive consists of a symmetric sparse perception module and a parallel motion planner.
For motion prediction and planning, we review the great similarity between these two tasks, leading to a parallel design for motion planner.
arXiv Detail & Related papers (2024-05-30T02:13:56Z) - Path Planning based on 2D Object Bounding-box [8.082514573754954]
We present a path planning method that utilizes 2D bounding boxes of objects, developed through imitation learning in urban driving scenarios.
This is achieved by integrating high-definition (HD) map data with images captured by surrounding cameras.
We evaluate our model on the nuPlan planning task and observed that it performs competitively in comparison to existing vision-centric methods.
arXiv Detail & Related papers (2024-02-22T19:34:56Z) - PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving [57.89801036693292]
PPAD (Iterative Interaction of Prediction and Planning Autonomous Driving) considers the timestep-wise interaction to better integrate prediction and planning.
We design ego-to-agent, ego-to-map, and ego-to-BEV interaction mechanisms with hierarchical dynamic key objects attention to better model the interactions.
arXiv Detail & Related papers (2023-11-14T11:53:24Z) - ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal
Feature Learning [132.20119288212376]
We propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously.
To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system.
arXiv Detail & Related papers (2022-07-15T16:57:43Z) - End-to-End Interactive Prediction and Planning with Optical Flow
Distillation for Autonomous Driving [16.340715765227475]
We propose an end-to-end interactive neural motion planner (INMP) for autonomous driving in this paper.
Our INMP first generates a feature map in bird's-eye-view space, which is then processed to detect other agents and perform interactive prediction and planning jointly.
Also, we adopt an optical flow distillation paradigm, which can effectively improve the network performance while still maintaining its real-time inference speed.
arXiv Detail & Related papers (2021-04-18T14:05:18Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.