From Human Intention to Action Prediction: A Comprehensive Benchmark for Intention-driven End-to-End Autonomous Driving
- URL: http://arxiv.org/abs/2512.12302v1
- Date: Sat, 13 Dec 2025 11:59:51 GMT
- Title: From Human Intention to Action Prediction: A Comprehensive Benchmark for Intention-driven End-to-End Autonomous Driving
- Authors: Huan Zheng, Yucheng Zhou, Tianyi Yan, Jiayi Su, Hongjun Chen, Dubing Chen, Wencheng Han, Runzhou Tao, Zhongying Qiu, Jianfei Yang, Jianbing Shen,
- Abstract summary: Current autonomous driving systems operate at a level of intelligence akin to following simple steering commands.<n>We introduce Intention-Drive, the first comprehensive benchmark designed to evaluate the ability to translate high-level human intent into safe and precise driving actions.
- Score: 67.23302649816466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current end-to-end autonomous driving systems operate at a level of intelligence akin to following simple steering commands. However, achieving genuinely intelligent autonomy requires a paradigm shift: moving from merely executing low-level instructions to understanding and fulfilling high-level, abstract human intentions. This leap from a command-follower to an intention-fulfiller, as illustrated in our conceptual framework, is hindered by a fundamental challenge: the absence of a standardized benchmark to measure and drive progress on this complex task. To address this critical gap, we introduce Intention-Drive, the first comprehensive benchmark designed to evaluate the ability to translate high-level human intent into safe and precise driving actions. Intention-Drive features two core contributions: (1) a new dataset of complex scenarios paired with corresponding natural language intentions, and (2) a novel evaluation protocol centered on the Intent Success Rate (ISR), which assesses the semantic fulfillment of the human's goal beyond simple geometric accuracy. Through an extensive evaluation of a spectrum of baseline models on Intention-Drive, we reveal a significant performance deficit, showing that the baseline model struggle to achieve the comprehensive scene and intention understanding required for this advanced task.
Related papers
- ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation [55.467742403416175]
We introduce a physics-driven neural algorithm that translates large-scale motion capture to humanoid embodiments.<n>We learn a unified multimodal controller that supports both dense references and sparse task specifications.<n>Results show that ULTRA generalizes to autonomous, goal-conditioned whole-body loco-manipulation from egocentric perception.
arXiv Detail & Related papers (2026-03-03T18:59:29Z) - Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems [75.78934957242403]
Self-driving vehicles and drones require true Spatial Intelligence from multi-modal onboard sensor data.<n>This paper presents a framework for multi-modal pre-training, identifying the core set of techniques driving progress toward this goal.
arXiv Detail & Related papers (2025-12-30T17:58:01Z) - Mimir: Hierarchical Goal-Driven Diffusion with Uncertainty Propagation for End-to-End Autonomous Driving [17.533465904228844]
We propose Mimir, a novel hierarchical dual-system framework capable of generating robust trajectories relying on goal points with uncertainty estimation.<n>Mimir surpasses previous state-of-the-art methods with a 20% improvement in the driving scoreS, while achieving 1.6 times improvement in high-level module inference speed.
arXiv Detail & Related papers (2025-12-08T03:31:25Z) - DriveCritic: Towards Context-Aware, Human-Aligned Evaluation for Autonomous Driving with Vision-Language Models [24.168614747778538]
We introduce DriveCritic, a novel framework featuring two key contributions.<n>The dataset is a curated collection of challenging scenarios where context is critical for correct judgment.<n>The DriveCritic model learns to adjudicate between trajectory pairs by integrating visual and symbolic context.
arXiv Detail & Related papers (2025-10-15T03:00:38Z) - IntentionVLA: Generalizable and Efficient Embodied Intention Reasoning for Human-Robot Interaction [51.130510883952546]
Vision-Language-Action (VLA) models leverage pretrained vision-language models (VLMs) to couple perception with robotic control.<n>We propose textbfIntentionVLA, a VLA framework with a curriculum training paradigm and an efficient inference mechanism.<n>Our proposed method first leverages carefully designed reasoning data that combine intention inference, spatial grounding, and compact embodied reasoning.
arXiv Detail & Related papers (2025-10-09T04:49:46Z) - ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving [27.75047397292818]
End-to-end autonomous driving has emerged as a promising approach to unify perception, prediction, and planning within a single framework.<n>We propose ReAL-AD, a Reasoning-Augmented Learning framework that structures decision-making in autonomous driving based on the three-tier human cognitive model.<n>We show that integrating our framework improves planning accuracy and safety by over 30%, making end-to-end autonomous driving more interpretable and aligned with human-like hierarchical reasoning.
arXiv Detail & Related papers (2025-07-16T02:23:24Z) - ThinkBot: Embodied Instruction Following with Thought Chain Reasoning [66.09880459084901]
Embodied Instruction Following (EIF) requires agents to complete human instruction by interacting objects in complicated surrounding environments.
We propose ThinkBot that reasons the thought chain in human instruction to recover the missing action descriptions.
Our ThinkBot outperforms the state-of-the-art EIF methods by a sizable margin in both success rate and execution efficiency.
arXiv Detail & Related papers (2023-12-12T08:30:09Z) - Planning-oriented Autonomous Driving [60.93767791255728]
We argue that a favorable framework should be devised and optimized in pursuit of the ultimate goal, i.e., planning of the self-driving car.
We introduce Unified Autonomous Driving (UniAD), a comprehensive framework that incorporates full-stack driving tasks in one network.
arXiv Detail & Related papers (2022-12-20T10:47:53Z) - Important Object Identification with Semi-Supervised Learning for
Autonomous Driving [37.654878298744855]
We propose a novel approach for important object identification in egocentric driving scenarios.
We present a semi-supervised learning pipeline to enable the model to learn from unlimited unlabeled data.
Our approach also outperforms rule-based baselines by a large margin.
arXiv Detail & Related papers (2022-03-05T01:23:13Z) - An Information Bottleneck Approach for Controlling Conciseness in
Rationale Extraction [84.49035467829819]
We show that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective.
Our fully unsupervised approach jointly learns an explainer that predicts sparse binary masks over sentences, and an end-task predictor that considers only the extracted rationale.
arXiv Detail & Related papers (2020-05-01T23:26:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.