Related papers: A Survey for Foundation Models in Autonomous Driving

A Survey for Foundation Models in Autonomous Driving

URL: http://arxiv.org/abs/2402.01105v1
Date: Fri, 2 Feb 2024 02:44:59 GMT
Title: A Survey for Foundation Models in Autonomous Driving
Authors: Haoxiang Gao and Yaqian Li and Kaiwen Long and Ming Yang and Yiqing Shen
Abstract summary: Large language models contribute to planning and simulation in autonomous driving. vision foundation models are increasingly adapted for critical tasks such as 3D object detection and tracking. Multi-modal foundation models, integrating diverse inputs, exhibit exceptional visual understanding and spatial reasoning.
Score: 11.726604658478152
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The advent of foundation models has revolutionized the fields of natural language processing and computer vision, paving the way for their application in autonomous driving (AD). This survey presents a comprehensive review of more than 40 research papers, demonstrating the role of foundation models in enhancing AD. Large language models contribute to planning and simulation in AD, particularly through their proficiency in reasoning, code generation and translation. In parallel, vision foundation models are increasingly adapted for critical tasks such as 3D object detection and tracking, as well as creating realistic driving scenarios for simulation and testing. Multi-modal foundation models, integrating diverse inputs, exhibit exceptional visual understanding and spatial reasoning, crucial for end-to-end AD. This survey not only provides a structured taxonomy, categorizing foundation models based on their modalities and functionalities within the AD domain but also delves into the methods employed in current research. It identifies the gaps between existing foundation models and cutting-edge AD approaches, thereby charting future research directions and proposing a roadmap for bridging these gaps.

Related papers

Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open Challenges [53.47232506143113]
Multi-modal foundation models have transformed the technology for autonomous driving.<n>We provide a comprehensive examination of such methods through a unifying taxonomy.<n>We assess these approaches with respect to the openness of their source code and datasets.
arXiv Detail & Related papers (2025-10-31T18:05:02Z)
Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis [19.212494396144404]
Simulation- and scenario-based testing have emerged as key approaches to development and validation of autonomous driving systems.<n>Foundation models represent a new generation of pre-trained, general-purpose AI models.<n>Our survey presents a unified taxonomy that includes large language models, vision-language models, multimodal large language models, diffusion models, and world models for the generation and analysis of autonomous driving scenarios.
arXiv Detail & Related papers (2025-06-13T07:25:59Z)
Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z)
A Genealogy of Foundation Models in Remote Sensing [0.4468952886990849]
Foundation models have garnered increasing attention for representation learning in remote sensing.<n>This paper examines these approaches, along with their roots in the computer vision field.<n>We discuss the quality of the learned representations and methods to alleviate the need for massive compute resources.
arXiv Detail & Related papers (2025-04-24T01:23:00Z)
Multi-Modal Foundation Models for Computational Pathology: A Survey [32.25958653387204]
Foundation models have emerged as a powerful paradigm in computational pathology (CPath) We categorize 32 state-of-the-art multi-modal foundation models into three major paradigms: vision-language, vision-knowledge graph, and vision-gene expression. We analyze 28 available multi-modal datasets tailored for pathology, grouped into image-text pairs, instruction datasets, and image-other modality pairs.
arXiv Detail & Related papers (2025-03-12T06:03:33Z)
A Survey of Model Architectures in Information Retrieval [59.61734783818073]
The period from 2019 to the present has represented one of the biggest paradigm shifts in information retrieval (IR) and natural language processing (NLP)<n>We trace the development from traditional term-based methods to modern neural approaches, particularly highlighting the impact of transformer-based models and subsequent large language models (LLMs)<n>We conclude with a forward-looking discussion of emerging challenges and future directions.
arXiv Detail & Related papers (2025-02-20T18:42:58Z)
Low-Rank Adaptation for Foundation Models: A Comprehensive Review [56.341827242332194]
Low-Rank Adaptation (LoRA) has emerged as a highly promising approach for mitigating these challenges.<n>This survey provides the first comprehensive review of LoRA techniques beyond large Language Models to general foundation models.
arXiv Detail & Related papers (2024-12-31T09:38:55Z)
AI Foundation Models in Remote Sensing: A Survey [6.036426846159163]
This paper provides a comprehensive survey of foundation models in the remote sensing domain. We categorize these models based on their applications in computer vision and domain-specific tasks. We highlight emerging trends and the significant advancements achieved by these foundation models.
arXiv Detail & Related papers (2024-08-06T22:39:34Z)
A Survey of Resource-efficient LLM and Multimodal Foundation Models [22.23967603206849]
Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and multimodal models, are revolutionizing the entire machine learning lifecycle. However, the substantial advancements in versatility and performance these models offer come at a significant cost in terms of hardware resources. This survey delves into the critical importance of such research, examining both algorithmic and systemic aspects.
arXiv Detail & Related papers (2024-01-16T03:35:26Z)
A Survey of Reasoning with Foundation Models [235.7288855108172]
Reasoning plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. We introduce seminal foundation models proposed or adaptable for reasoning. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models.
arXiv Detail & Related papers (2023-12-17T15:16:13Z)
Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey [30.528346074194925]
Visual foundation models (VFMs) have become a catalyst for groundbreaking developments in computer vision. This review paper delineates the pivotal trajectories of VFMs, emphasizing their scalability and proficiency in generative tasks. A crucial direction for forthcoming innovation is the amalgamation of generative and discriminative paradigms.
arXiv Detail & Related papers (2023-12-15T19:17:15Z)
Foundational Models in Medical Imaging: A Comprehensive Survey and Future Vision [6.2847894163744105]
Foundation models are large-scale, pre-trained deep-learning models adapted to a wide range of downstream tasks. These models facilitate contextual reasoning, generalization, and prompt capabilities at test time. Capitalizing on the advances in computer vision, medical imaging has also marked a growing interest in these models.
arXiv Detail & Related papers (2023-10-28T12:08:12Z)
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook [95.32949323258251]
Temporal data, notably time series andtemporal-temporal data, are prevalent in real-world applications. Recent advances in large language and other foundational models have spurred increased use in time series andtemporal data mining.
arXiv Detail & Related papers (2023-10-16T09:06:00Z)
Geometric Deep Learning for Structure-Based Drug Design: A Survey [83.87489798671155]
Structure-based drug design (SBDD) leverages the three-dimensional geometry of proteins to identify potential drug candidates. Recent advancements in geometric deep learning, which effectively integrate and process 3D geometric data, have significantly propelled the field forward.
arXiv Detail & Related papers (2023-06-20T14:21:58Z)
CHORUS: Foundation Models for Unified Data Discovery and Exploration [6.85448651843431]
We show that foundation models are highly applicable to the data discovery and data exploration domain. We show that a foundation-model-based approach outperforms the task-specific models and so the state of the art. This suggests a future direction in which disparate data management tasks can be unified under foundation models.
arXiv Detail & Related papers (2023-06-16T03:58:42Z)
Foundation Models for Decision Making: Problems, Methods, and Opportunities [124.79381732197649]
Foundation models pretrained on diverse data at scale have demonstrated extraordinary capabilities in a wide range of vision and language tasks. New paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning. Research at the intersection of foundation models and decision making holds tremendous promise for creating powerful new systems.
arXiv Detail & Related papers (2023-03-07T18:44:07Z)
Quantitatively Assessing the Benefits of Model-driven Development in Agent-based Modeling and Simulation [80.49040344355431]
This paper compares the use of MDD and ABMS platforms in terms of effort and developer mistakes. The obtained results show that MDD4ABMS requires less effort to develop simulations with similar (sometimes better) design quality than NetLogo.
arXiv Detail & Related papers (2020-06-15T23:29:04Z)
A Comprehensive Study on Temporal Modeling for Online Action Detection [50.558313106389335]
Online action detection (OAD) is a practical yet challenging task, which has attracted increasing attention in recent years. This paper aims to provide a comprehensive study on temporal modeling for OAD including four meta types of temporal modeling methods. We present several hybrid temporal modeling methods, which outperform the recent state-of-the-art methods with sizable margins on THUMOS-14 and TVSeries.
arXiv Detail & Related papers (2020-01-21T13:12:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.