Related papers: Mimir: Hierarchical Goal-Driven Diffusion with Uncertainty Propagation for End-to-End Autonomous Driving

Mimir: Hierarchical Goal-Driven Diffusion with Uncertainty Propagation for End-to-End Autonomous Driving

URL: http://arxiv.org/abs/2512.07130v1
Date: Mon, 08 Dec 2025 03:31:25 GMT
Title: Mimir: Hierarchical Goal-Driven Diffusion with Uncertainty Propagation for End-to-End Autonomous Driving
Authors: Zebin Xing, Yupeng Zheng, Qichao Zhang, Zhixing Ding, Pengxuan Yang, Songen Gu, Zhongpu Xia, Dongbin Zhao,
Abstract summary: We propose Mimir, a novel hierarchical dual-system framework capable of generating robust trajectories relying on goal points with uncertainty estimation.<n>Mimir surpasses previous state-of-the-art methods with a 20% improvement in the driving scoreS, while achieving 1.6 times improvement in high-level module inference speed.
Score: 17.533465904228844
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: End-to-end autonomous driving has emerged as a pivotal direction in the field of autonomous systems. Recent works have demonstrated impressive performance by incorporating high-level guidance signals to steer low-level trajectory planners. However, their potential is often constrained by inaccurate high-level guidance and the computational overhead of complex guidance modules. To address these limitations, we propose Mimir, a novel hierarchical dual-system framework capable of generating robust trajectories relying on goal points with uncertainty estimation: (1) Unlike previous approaches that deterministically model, we estimate goal point uncertainty with a Laplace distribution to enhance robustness; (2) To overcome the slow inference speed of the guidance system, we introduce a multi-rate guidance mechanism that predicts extended goal points in advance. Validated on challenging Navhard and Navtest benchmarks, Mimir surpasses previous state-of-the-art methods with a 20% improvement in the driving score EPDMS, while achieving 1.6 times improvement in high-level module inference speed without compromising accuracy. The code and models will be released soon to promote reproducibility and further development. The code is available at https://github.com/ZebinX/Mimir-Uncertainty-Driving

Related papers

Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation [12.450883696383878]
Imitation learning (IL) has emerged as a central paradigm in autonomous driving.<n>IL excels in matching expert behavior in open-loop settings by minimizing per-step prediction errors.<n>Over successive planning cycles, small, often imperceptible errors compound, potentially resulting in severe failures.<n>We propose Sequence of Experts (SoE) to enhance closed-loop performance without increasing model size or data requirements.
arXiv Detail & Related papers (2025-12-15T08:50:23Z)
From Human Intention to Action Prediction: A Comprehensive Benchmark for Intention-driven End-to-End Autonomous Driving [67.23302649816466]
Current autonomous driving systems operate at a level of intelligence akin to following simple steering commands.<n>We introduce Intention-Drive, the first comprehensive benchmark designed to evaluate the ability to translate high-level human intent into safe and precise driving actions.
arXiv Detail & Related papers (2025-12-13T11:59:51Z)
dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning [69.36145467833498]
We introduce dVLM-AD, a diffusion-based vision-language model that unifies perception, structured reasoning, and low-level planning for end-to-end driving.<n> evaluated on nuScenes and WOD-E2E, dVLM-AD yields more consistent reasoning-action pairs and achieves planning performance comparable to existing driving VLM/VLA systems.
arXiv Detail & Related papers (2025-12-04T05:05:41Z)
MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving [13.786046699744476]
We propose MindDrive, a framework that integrates high-quality trajectory generation with comprehensive decision reasoning.<n>In particular, the proposed Future-aware Trajectory Generator (FaTG) performs ego-conditioned "what-if" simulations to predict potential future scenes.<n>Building upon this, the VLM-oriented Evaluator (VLoE) leverages the reasoning capability of a large vision-language model to conduct multi-objective evaluations.
arXiv Detail & Related papers (2025-12-04T04:16:10Z)
ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z)
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving [55.13109926181247]
We introduce ReflectDrive, a learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion.<n>Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient.<n>Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors.
arXiv Detail & Related papers (2025-09-24T13:35:15Z)
ViLaD: A Large Vision Language Diffusion Framework for End-to-End Autonomous Driving [14.486548540613791]
We introduce ViLaD, a novel Large Vision Language Diffusion framework for end-to-end autonomous driving.<n>ViLaD enables parallel generation of entire driving decision sequences, significantly reducing computational latency.<n>We conduct comprehensive experiments on the nuScenes dataset, where ViLaD outperforms state-of-the-art autoregressive VLM baselines in both planning accuracy and inference speed.
arXiv Detail & Related papers (2025-08-18T04:01:56Z)
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving [49.07731497951963]
ReCogDrive is a novel Reinforced Cognitive framework for end-to-end autonomous driving.<n>We introduce a hierarchical data pipeline that mimics the sequential cognitive process of human drivers.<n>We then address the language-action mismatch by injecting the VLM's learned driving priors into a diffusion planner.
arXiv Detail & Related papers (2025-06-09T03:14:04Z)
DriveMind: A Dual-VLM based Reinforcement Learning Framework for Autonomous Driving [14.988477212106018]
DriveMind is a semantic reward framework for autonomous driving.<n>We show it can achieve 19.4 +/- 2.3 km/h average speed, 0.98 +/- 0.03 route completion, and near-zero collisions.<n>Its semantic reward generalizes zero-shot to real dash-cam data with minimal distributional shift.
arXiv Detail & Related papers (2025-06-01T03:51:09Z)
DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.<n>Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.<n>Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z)
Bootstrap Motion Forecasting With Self-Consistent Constraints [52.88100002373369]
We present a novel framework to bootstrap Motion forecasting with Self-consistent Constraints. The motion forecasting task aims at predicting future trajectories of vehicles by incorporating spatial and temporal information from the past. We show that our proposed scheme consistently improves the prediction performance of several existing methods.
arXiv Detail & Related papers (2022-04-12T14:59:48Z)
IntentNet: Learning to Predict Intention from Raw Sensor Data [86.74403297781039]
In this paper, we develop a one-stage detector and forecaster that exploits both 3D point clouds produced by a LiDAR sensor as well as dynamic maps of the environment. Our multi-task model achieves better accuracy than the respective separate modules while saving computation, which is critical to reducing reaction time in self-driving applications.
arXiv Detail & Related papers (2021-01-20T00:31:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.