Foundation Models for Autonomous Driving Perception: A Survey Through Core Capabilities
- URL: http://arxiv.org/abs/2509.08302v1
- Date: Wed, 10 Sep 2025 05:45:49 GMT
- Title: Foundation Models for Autonomous Driving Perception: A Survey Through Core Capabilities
- Authors: Rajendramayavan Sathyam, Yueqi Li,
- Abstract summary: Foundation models are revolutionizing autonomous driving perception, transitioning the field from narrow, task-specific deep learning models to versatile, general-purpose architectures trained on vast, diverse datasets.<n>This survey examines how these models address critical challenges in autonomous perception, including limitations in generalization, scalability, and robustness to distributional shifts.
- Score: 0.6445605125467574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models are revolutionizing autonomous driving perception, transitioning the field from narrow, task-specific deep learning models to versatile, general-purpose architectures trained on vast, diverse datasets. This survey examines how these models address critical challenges in autonomous perception, including limitations in generalization, scalability, and robustness to distributional shifts. The survey introduces a novel taxonomy structured around four essential capabilities for robust performance in dynamic driving environments: generalized knowledge, spatial understanding, multi-sensor robustness, and temporal reasoning. For each capability, the survey elucidates its significance and comprehensively reviews cutting-edge approaches. Diverging from traditional method-centric surveys, our unique framework prioritizes conceptual design principles, providing a capability-driven guide for model development and clearer insights into foundational aspects. We conclude by discussing key challenges, particularly those associated with the integration of these capabilities into real-time, scalable systems, and broader deployment challenges related to computational demands and ensuring model reliability against issues like hallucinations and out-of-distribution failures. The survey also outlines crucial future research directions to enable the safe and effective deployment of foundation models in autonomous driving systems.
Related papers
- CogRail: Benchmarking VLMs in Cognitive Intrusion Perception for Intelligent Railway Transportation Systems [29.385460126069386]
We introduce a novel benchmark, CogRail, which integrates curated datasets with cognitively driven question-answer annotations.<n>Building upon this benchmark, we conduct a systematic evaluation of state-of-the-art visual-language models.<n>We propose a joint fine-tuning framework that integrates three core tasks, position perception, movement prediction, and threat analysis.
arXiv Detail & Related papers (2026-01-14T16:36:26Z) - Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems [75.78934957242403]
Self-driving vehicles and drones require true Spatial Intelligence from multi-modal onboard sensor data.<n>This paper presents a framework for multi-modal pre-training, identifying the core set of techniques driving progress toward this goal.
arXiv Detail & Related papers (2025-12-30T17:58:01Z) - Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open Challenges [53.47232506143113]
Multi-modal foundation models have transformed the technology for autonomous driving.<n>We provide a comprehensive examination of such methods through a unifying taxonomy.<n>We assess these approaches with respect to the openness of their source code and datasets.
arXiv Detail & Related papers (2025-10-31T18:05:02Z) - Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation [75.30238170051291]
Depth estimation is a fundamental task in 3D computer vision, crucial for applications such as 3D reconstruction, free-viewpoint rendering, robotics, autonomous driving, and AR/VR technologies.<n>Traditional methods relying on hardware sensors like LiDAR are often limited by high costs, low resolution, and environmental sensitivity, limiting their applicability in real-world scenarios.<n>Recent advances in vision-based methods offer a promising alternative, yet they face challenges in generalization and stability due to either the low-capacity model architectures or the reliance on domain-specific and small-scale datasets.
arXiv Detail & Related papers (2025-07-15T17:59:59Z) - Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z) - A Survey of World Models for Autonomous Driving [55.520179689933904]
Recent breakthroughs in autonomous driving have been propelled by advances in robust world modeling.<n>World models offer high-fidelity representations of the driving environment that integrate multi-sensor data, semantic cues, and temporal dynamics.<n>Future research must address key challenges in self-supervised representation learning, multimodal fusion, and advanced simulation.
arXiv Detail & Related papers (2025-01-20T04:00:02Z) - Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks [50.75902473813379]
This work introduces a comprehensive evaluation framework that systematically examines the role of instructions and inputs in the generalisation abilities of such models.
The proposed framework uncovers the resilience of multimodal models to extreme instruction perturbations and their vulnerability to observational changes.
arXiv Detail & Related papers (2024-07-04T14:36:49Z) - World Models for Autonomous Driving: An Initial Survey [16.448614804069674]
The capability to accurately predict future events and assess their implications is paramount for both safety and efficiency.
World models have emerged as a transformative approach, enabling autonomous driving systems to synthesize and interpret vast amounts of sensor data.
This paper provides an initial review of the current state and prospective advancements of world models in autonomous driving.
arXiv Detail & Related papers (2024-03-05T03:23:55Z) - Beyond One Model Fits All: Ensemble Deep Learning for Autonomous
Vehicles [16.398646583844286]
This study introduces three distinct neural network models corresponding to Mediated Perception, Behavior Reflex, and Direct Perception approaches.
Our architecture fuses information from the base, future latent vector prediction, and auxiliary task networks, using global routing commands to select appropriate action sub-networks.
arXiv Detail & Related papers (2023-12-10T04:40:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.