DistillDrive: End-to-End Multi-Mode Autonomous Driving Distillation by Isomorphic Hetero-Source Planning Model
- URL: http://arxiv.org/abs/2508.05402v1
- Date: Thu, 07 Aug 2025 13:54:35 GMT
- Title: DistillDrive: End-to-End Multi-Mode Autonomous Driving Distillation by Isomorphic Hetero-Source Planning Model
- Authors: Rui Yu, Xianghang Zhang, Runkai Zhao, Huaicheng Yan, Meng Wang,
- Abstract summary: We introduce DistillDrive, an end-to-end knowledge distillation-based autonomous driving model.<n>We employ a planning model based on structured scene representations as the teacher model.<n>We validate our model on the nuScenes and NAVSIM datasets, achieving a 50% reduction in collision rate.
- Score: 23.573720107353868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end autonomous driving has been recently seen rapid development, exerting a profound influence on both industry and academia. However, the existing work places excessive focus on ego-vehicle status as their sole learning objectives and lacks of planning-oriented understanding, which limits the robustness of the overall decision-making prcocess. In this work, we introduce DistillDrive, an end-to-end knowledge distillation-based autonomous driving model that leverages diversified instance imitation to enhance multi-mode motion feature learning. Specifically, we employ a planning model based on structured scene representations as the teacher model, leveraging its diversified planning instances as multi-objective learning targets for the end-to-end model. Moreover, we incorporate reinforcement learning to enhance the optimization of state-to-decision mappings, while utilizing generative modeling to construct planning-oriented instances, fostering intricate interactions within the latent space. We validate our model on the nuScenes and NAVSIM datasets, achieving a 50\% reduction in collision rate and a 3-point improvement in closed-loop performance compared to the baseline model. Code and model are publicly available at https://github.com/YuruiAI/DistillDrive
Related papers
- AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning [42.409352964719204]
Vision-Language-Action (VLA) models have shown promise for end-to-end autonomous driving.<n>Current VLA models struggle with physically infeasible action outputs, complex model structures, or unnecessarily long reasoning.<n>We propose AutoVLA, a novel VLA model that unifies reasoning and action generation within a single autoregressive generation model.
arXiv Detail & Related papers (2025-06-16T17:58:50Z) - DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving [20.197094443215963]
We present DriveX, a self-supervised world model that learns general scene dynamics and holistic representations from driving videos.<n>DriveX introduces Omni Scene Modeling (OSM), a module that unifies multimodal supervision-3D point cloud forecasting, 2D semantic representation, and image generation.<n>For downstream adaptation, we design Future Spatial Attention (FSA), a unified paradigm that dynamically aggregates features from DriveX's predictions to enhance task-specific inference.
arXiv Detail & Related papers (2025-05-25T17:27:59Z) - Latent Diffusion Planning for Imitation Learning [78.56207566743154]
Latent Diffusion Planning (LDP) is a modular approach consisting of a planner and inverse dynamics model.<n>By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data.<n>On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches.
arXiv Detail & Related papers (2025-04-23T17:53:34Z) - A Survey of World Models for Autonomous Driving [63.33363128964687]
Recent breakthroughs in autonomous driving have been propelled by advances in robust world modeling.<n>World models offer high-fidelity representations of the driving environment that integrate multi-sensor data, semantic cues, and temporal dynamics.<n>This paper systematically reviews recent advances in world models for autonomous driving.
arXiv Detail & Related papers (2025-01-20T04:00:02Z) - DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers [61.92571851411509]
We introduce a multimodal driving language based on interleaved image and action tokens, and develop DrivingGPT to learn joint world modeling and planning.<n>Our DrivingGPT demonstrates strong performance in both action-conditioned video generation and end-to-end planning, outperforming strong baselines on large-scale nuPlan and NAVSIM benchmarks.
arXiv Detail & Related papers (2024-12-24T18:59:37Z) - VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision [20.43366384946928]
Vision-language models (VLMs) as teachers to enhance training.<n>VLM-AD achieves significant improvements in planning accuracy and reduced collision rates on the nuScenes dataset.
arXiv Detail & Related papers (2024-12-19T01:53:36Z) - DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.<n>Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.<n>Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - Model-Based Reinforcement Learning with Isolated Imaginations [61.67183143982074]
We propose Iso-Dream++, a model-based reinforcement learning approach.
We perform policy optimization based on the decoupled latent imaginations.
This enables long-horizon visuomotor control tasks to benefit from isolating mixed dynamics sources in the wild.
arXiv Detail & Related papers (2023-03-27T02:55:56Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.