FutureX: Enhance End-to-End Autonomous Driving via Latent Chain-of-Thought World Model
- URL: http://arxiv.org/abs/2512.11226v1
- Date: Fri, 12 Dec 2025 02:12:49 GMT
- Title: FutureX: Enhance End-to-End Autonomous Driving via Latent Chain-of-Thought World Model
- Authors: Hongbin Lin, Yiming Yang, Yifan Zhang, Chaoda Zheng, Jie Feng, Sheng Wang, Zhennan Wang, Shijia Chen, Boyang Wang, Yu Zhang, Xianming Liu, Shuguang Cui, Zhen Li,
- Abstract summary: FutureX is a pipeline that enhances end-to-end planners to perform complex motion planning via future scene latent reasoning and trajectory refinement.<n>FutureX enhances existing methods by producing more rational motion plans and fewer collisions without compromising efficiency.
- Score: 103.2513470454204
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In autonomous driving, end-to-end planners learn scene representations from raw sensor data and utilize them to generate a motion plan or control actions. However, exclusive reliance on the current scene for motion planning may result in suboptimal responses in highly dynamic traffic environments where ego actions further alter the future scene. To model the evolution of future scenes, we leverage the World Model to represent how the ego vehicle and its environment interact and change over time, which entails complex reasoning. The Chain of Thought (CoT) offers a promising solution by forecasting a sequence of future thoughts that subsequently guide trajectory refinement. In this paper, we propose FutureX, a CoT-driven pipeline that enhances end-to-end planners to perform complex motion planning via future scene latent reasoning and trajectory refinement. Specifically, the Auto-think Switch examines the current scene and decides whether additional reasoning is required to yield a higher-quality motion plan. Once FutureX enters the Thinking mode, the Latent World Model conducts a CoT-guided rollout to predict future scene representation, enabling the Summarizer Module to further refine the motion plan. Otherwise, FutureX operates in an Instant mode to generate motion plans in a forward pass for relatively simple scenes. Extensive experiments demonstrate that FutureX enhances existing methods by producing more rational motion plans and fewer collisions without compromising efficiency, thereby achieving substantial overall performance gains, e.g., 6.2 PDMS improvement for TransFuser on NAVSIM. Code will be released.
Related papers
- Future-Aware End-to-End Driving: Bidirectional Modeling of Trajectory Planning and Scene Evolution [96.25314747309811]
We introduce SeerDrive, a novel end-to-end framework that jointly models future scene evolution and trajectory planning.<n>Our method first predicts future bird's-eye view (BEV) representations to anticipate the dynamics of the surrounding scene.<n>Two key components enable this: (1) future-aware planning, which injects predicted BEV features into the trajectory planner, and (2) iterative scene modeling and vehicle planning.
arXiv Detail & Related papers (2025-10-13T07:41:47Z) - Autoregressive End-to-End Planning with Time-Invariant Spatial Alignment and Multi-Objective Policy Refinement [15.002921311530374]
Autoregressive models are a formidable baseline for end-to-end planning in autonomous driving.<n>Their performance is constrained by atemporal misalignment, as the planner must condition future actions on past sensory data.<n>We propose a Time-Invariant Alignment (TISA) module that learns to project initial environmental features into a consistent ego-centric frame.<n>We also introduce a multi-objective post-training stage using Direct Preference Optimization (DPO) to move beyond pure imitation.
arXiv Detail & Related papers (2025-09-25T09:24:45Z) - ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving [64.12414815634847]
Vision-Language Models (VLMs) and Driving World Models (DWMs) have independently emerged as powerful recipes addressing different aspects of this challenge.<n>We propose ImagiDrive, a novel end-to-end autonomous driving framework that integrates a VLM-based driving agent with a DWM-based scene imaginer.
arXiv Detail & Related papers (2025-08-15T12:06:55Z) - DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving [20.197094443215963]
We present DriveX, a self-supervised world model that learns general scene dynamics and holistic representations from driving videos.<n>DriveX introduces Omni Scene Modeling (OSM), a module that unifies multimodal supervision-3D point cloud forecasting, 2D semantic representation, and image generation.<n>For downstream adaptation, we design Future Spatial Attention (FSA), a unified paradigm that dynamically aggregates features from DriveX's predictions to enhance task-specific inference.
arXiv Detail & Related papers (2025-05-25T17:27:59Z) - FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving [19.81442567260658]
We propose a visual-temporalT framework that enables VLAs to think in images.<n>On nuScenes and NAVSIM, FSDrive improves accuracy and reduces collisions.
arXiv Detail & Related papers (2025-05-23T09:55:32Z) - DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.<n>Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.<n>Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - GenAD: Generative End-to-End Autonomous Driving [13.332272121018285]
GenAD is a generative framework that casts autonomous driving into a generative modeling problem.
We propose an instance-centric scene tokenizer that first transforms the surrounding scenes into map-aware instance tokens.
We then employ a variational autoencoder to learn the future trajectory distribution in a structural latent space for trajectory prior modeling.
arXiv Detail & Related papers (2024-02-18T08:21:05Z) - LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving [139.33800431159446]
LookOut is an approach to jointly perceive the environment and predict a diverse set of futures from sensor data.
We show that our model demonstrates significantly more diverse and sample-efficient motion forecasting in a large-scale self-driving dataset.
arXiv Detail & Related papers (2021-01-16T23:19:22Z) - The Importance of Prior Knowledge in Precise Multimodal Prediction [71.74884391209955]
Roads have well defined geometries, topologies, and traffic rules.
In this paper we propose to incorporate structured priors as a loss function.
We demonstrate the effectiveness of our approach on real-world self-driving datasets.
arXiv Detail & Related papers (2020-06-04T03:56:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.