Large Foundation Models for Trajectory Prediction in Autonomous Driving: A Comprehensive Survey
- URL: http://arxiv.org/abs/2509.10570v1
- Date: Thu, 11 Sep 2025 10:30:06 GMT
- Title: Large Foundation Models for Trajectory Prediction in Autonomous Driving: A Comprehensive Survey
- Authors: Wei Dai, Shengen Wu, Wei Wu, Zhenhao Wang, Sisuo Lyu, Haicheng Liao, Limin Yu, Weiping Ding, Runwei Guan, Yutao Yue,
- Abstract summary: Trayjectory prediction serves as a critical functionality in autonomous driving.<n>The rise of Large Foundation Models (LFMs) is transforming the research paradigm of trajectory prediction.<n>This article highlights three core methodologies: trajectory-language mapping, multimodal fusion, and constraint-based reasoning.
- Score: 26.44322475984292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Trajectory prediction serves as a critical functionality in autonomous driving, enabling the anticipation of future motion paths for traffic participants such as vehicles and pedestrians, which is essential for driving safety. Although conventional deep learning methods have improved accuracy, they remain hindered by inherent limitations, including lack of interpretability, heavy reliance on large-scale annotated data, and weak generalization in long-tail scenarios. The rise of Large Foundation Models (LFMs) is transforming the research paradigm of trajectory prediction. This survey offers a systematic review of recent advances in LFMs, particularly Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) for trajectory prediction. By integrating linguistic and scene semantics, LFMs facilitate interpretable contextual reasoning, significantly enhancing prediction safety and generalization in complex environments. The article highlights three core methodologies: trajectory-language mapping, multimodal fusion, and constraint-based reasoning. It covers prediction tasks for both vehicles and pedestrians, evaluation metrics, and dataset analyses. Key challenges such as computational latency, data scarcity, and real-world robustness are discussed, along with future research directions including low-latency inference, causality-aware modeling, and motion foundation models.
Related papers
- Wireless Traffic Prediction with Large Language Model [54.07581399989292]
TIDES is a novel framework that captures spatial-temporal correlations for wireless traffic prediction.<n> TIDES achieves efficient adaptation to domain-specific patterns without incurring excessive training overhead.<n>Our results indicate that integrating spatial awareness into LLM-based predictors is the key to unlocking scalable and intelligent network management in future 6G systems.
arXiv Detail & Related papers (2025-12-19T04:47:40Z) - Robustness of LLM-enabled vehicle trajectory prediction under data security threats [6.812902306426757]
Large language models (LLMs) can accurately predict vehicle trajectories and lane-change intentions by gathering and transforming data from surrounding vehicles.<n>This study addresses this gap by conducting a systematic vulnerability analysis of LLM-enabled vehicle trajectory prediction.<n>Experiments on the highD dataset reveal that even minor, physically plausible perturbations can significantly disrupt model outputs.
arXiv Detail & Related papers (2025-11-14T03:26:51Z) - ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z) - ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving [64.12414815634847]
Vision-Language Models (VLMs) and Driving World Models (DWMs) have independently emerged as powerful recipes addressing different aspects of this challenge.<n>We propose ImagiDrive, a novel end-to-end autonomous driving framework that integrates a VLM-based driving agent with a DWM-based scene imaginer.
arXiv Detail & Related papers (2025-08-15T12:06:55Z) - ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving [49.07731497951963]
ReCogDrive is a novel Reinforced Cognitive framework for end-to-end autonomous driving.<n>We introduce a hierarchical data pipeline that mimics the sequential cognitive process of human drivers.<n>We then address the language-action mismatch by injecting the VLM's learned driving priors into a diffusion planner.
arXiv Detail & Related papers (2025-06-09T03:14:04Z) - A Survey of World Models for Autonomous Driving [55.520179689933904]
Recent breakthroughs in autonomous driving have been propelled by advances in robust world modeling.<n>World models offer high-fidelity representations of the driving environment that integrate multi-sensor data, semantic cues, and temporal dynamics.<n>Future research must address key challenges in self-supervised representation learning, multimodal fusion, and advanced simulation.
arXiv Detail & Related papers (2025-01-20T04:00:02Z) - Scene-Aware Explainable Multimodal Trajectory Prediction [15.58042746234974]
We introduce the Explainable Conditional Diffusion-based Multimodal Trajectory Prediction (DMTP) model.<n>Our model integrates a modified conditional diffusion approach to capture multimodal trajectory patterns and employs a revised Shapley Value model to assess the significance of global and scenario-specific features.<n> Experiments demonstrate that our explainable model excels in identifying critical inputs and significantly outperforms baseline models in accuracy.
arXiv Detail & Related papers (2024-10-22T08:17:33Z) - Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models [12.687494201105066]
This paper proposes Traj-LLM, the first to investigate the potential of using Large Language Models (LLMs) to generate future motion from agents' past/observed trajectories and scene semantics.
LLMs' powerful comprehension abilities capture a spectrum of high-level scene knowledge and interactive information.
Emulating the human-like lane focus cognitive function, we introduce lane-aware probabilistic learning powered by the pioneering Mamba module.
arXiv Detail & Related papers (2024-05-08T09:28:04Z) - Towards Explainable Traffic Flow Prediction with Large Language Models [36.86937188565623]
We propose a Traffic flow Prediction model based on Large Language Models (LLMs) to generate explainable traffic predictions.
By transferring multi-modal traffic data into natural language descriptions, xTP-LLM captures complex time-series patterns and external factors from comprehensive traffic data.
Empirically, xTP-LLM shows competitive accuracy compared with deep learning baselines, while providing an intuitive and reliable explanation for predictions.
arXiv Detail & Related papers (2024-04-03T07:14:15Z) - LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models [8.624969693477448]
Existing motion prediction approaches have ample room for improvement, particularly in terms of long-term prediction accuracy and interpretability.
We propose LC-LLM, an explainable lane change prediction model that leverages the strong reasoning capabilities and self-explanation abilities of Large Language Models.
arXiv Detail & Related papers (2024-03-27T08:34:55Z) - Certified Human Trajectory Prediction [66.1736456453465]
We propose a certification approach tailored for trajectory prediction that provides guaranteed robustness.<n>To mitigate the inherent performance drop through certification, we propose a diffusion-based trajectory denoiser and integrate it into our method.<n>We demonstrate the accuracy and robustness of the certified predictors and highlight their advantages over the non-certified ones.
arXiv Detail & Related papers (2024-03-20T17:41:35Z) - JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds [79.00975648564483]
Trajectory forecasting models, employed in fields such as robotics, autonomous vehicles, and navigation, face challenges in real-world scenarios.
This dataset provides comprehensive data, including the locations of all agents, scene images, and point clouds, all from the robot's perspective.
The objective is to predict the future positions of agents relative to the robot using raw sensory input data.
arXiv Detail & Related papers (2023-11-05T18:59:31Z) - The Importance of Prior Knowledge in Precise Multimodal Prediction [71.74884391209955]
Roads have well defined geometries, topologies, and traffic rules.
In this paper we propose to incorporate structured priors as a loss function.
We demonstrate the effectiveness of our approach on real-world self-driving datasets.
arXiv Detail & Related papers (2020-06-04T03:56:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.