SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation
- URL: http://arxiv.org/abs/2405.19620v2
- Date: Fri, 31 May 2024 07:40:55 GMT
- Title: SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation
- Authors: Wenchao Sun, Xuewu Lin, Yining Shi, Chuang Zhang, Haoran Wu, Sifa Zheng,
- Abstract summary: We propose a new end-to-end autonomous driving paradigm named SparseDrive.
SparseDrive consists of a symmetric sparse perception module and a parallel motion planner.
For motion prediction and planning, we review the great similarity between these two tasks, leading to a parallel design for motion planner.
- Score: 11.011219709863875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The well-established modular autonomous driving system is decoupled into different standalone tasks, e.g. perception, prediction and planning, suffering from information loss and error accumulation across modules. In contrast, end-to-end paradigms unify multi-tasks into a fully differentiable framework, allowing for optimization in a planning-oriented spirit. Despite the great potential of end-to-end paradigms, both the performance and efficiency of existing methods are not satisfactory, particularly in terms of planning safety. We attribute this to the computationally expensive BEV (bird's eye view) features and the straightforward design for prediction and planning. To this end, we explore the sparse representation and review the task design for end-to-end autonomous driving, proposing a new paradigm named SparseDrive. Concretely, SparseDrive consists of a symmetric sparse perception module and a parallel motion planner. The sparse perception module unifies detection, tracking and online mapping with a symmetric model architecture, learning a fully sparse representation of the driving scene. For motion prediction and planning, we review the great similarity between these two tasks, leading to a parallel design for motion planner. Based on this parallel design, which models planning as a multi-modal problem, we propose a hierarchical planning selection strategy , which incorporates a collision-aware rescore module, to select a rational and safe trajectory as the final planning output. With such effective designs, SparseDrive surpasses previous state-of-the-arts by a large margin in performance of all tasks, while achieving much higher training and inference efficiency. Code will be avaliable at https://github.com/swc-17/SparseDrive for facilitating future research.
Related papers
- DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.
Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.
Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving [13.404790614427924]
We propose a Sparse query-centric paradigm for end-to-end Autonomous Driving.
We design a unified sparse architecture for perception tasks including detection, tracking, and online mapping.
On the challenging nuScenes dataset, SparseAD achieves SOTA full-task performance among end-to-end methods.
arXiv Detail & Related papers (2024-04-10T10:34:34Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - Pixel State Value Network for Combined Prediction and Planning in
Interactive Environments [9.117828575880303]
This work proposes a deep learning methodology to combine prediction and planning.
A conditional GAN with the U-Net architecture is trained to predict two high-resolution image sequences.
Results demonstrate intuitive behavior in complex situations, such as lane changes amidst conflicting objectives.
arXiv Detail & Related papers (2023-10-11T17:57:13Z) - Planning-oriented Autonomous Driving [60.93767791255728]
We argue that a favorable framework should be devised and optimized in pursuit of the ultimate goal, i.e., planning of the self-driving car.
We introduce Unified Autonomous Driving (UniAD), a comprehensive framework that incorporates full-stack driving tasks in one network.
arXiv Detail & Related papers (2022-12-20T10:47:53Z) - PlanT: Explainable Planning Transformers via Object-Level
Representations [64.93938686101309]
PlanT is a novel approach for planning in the context of self-driving.
PlanT is based on imitation learning with a compact object-level input representation.
Our results indicate that PlanT can focus on the most relevant object in the scene, even when this object is geometrically distant.
arXiv Detail & Related papers (2022-10-25T17:59:46Z) - Multi-agent Soft Actor-Critic Based Hybrid Motion Planner for Mobile
Robots [16.402201426448002]
The planner is model-free and can realize the end-to-end mapping of multi-robot state and observation information to final smooth and continuous trajectories.
The design of the back-end trajectory optimization module is based on the minimal snap method with safety zone constraints.
arXiv Detail & Related papers (2021-12-13T12:23:30Z) - Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable
Semantic Representations [81.05412704590707]
We propose a novel end-to-end learnable network that performs joint perception, prediction and motion planning for self-driving vehicles.
Our network is learned end-to-end from human demonstrations.
arXiv Detail & Related papers (2020-08-13T14:40:46Z) - The Importance of Prior Knowledge in Precise Multimodal Prediction [71.74884391209955]
Roads have well defined geometries, topologies, and traffic rules.
In this paper we propose to incorporate structured priors as a loss function.
We demonstrate the effectiveness of our approach on real-world self-driving datasets.
arXiv Detail & Related papers (2020-06-04T03:56:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.