Prospective Role of Foundation Models in Advancing Autonomous Vehicles
- URL: http://arxiv.org/abs/2405.02288v2
- Date: Fri, 17 May 2024 10:47:50 GMT
- Title: Prospective Role of Foundation Models in Advancing Autonomous Vehicles
- Authors: Jianhua Wu, Bingzhao Gao, Jincheng Gao, Jianhao Yu, Hongqing Chu, Qiankun Yu, Xun Gong, Yi Chang, H. Eric Tseng, Hong Chen, Jie Chen,
- Abstract summary: Large-scale Foundation Models (FMs) have achieved remarkable results in many fields including natural language processing and computer vision.
This paper synthesizes the applications and future trends of FMs in autonomous driving.
- Score: 19.606191410333363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reasoning. By pre-training on rich linguistic and visual data, FMs can understand and interpret various elements in a driving scene, and provide cognitive reasoning to give linguistic and action instructions for driving decisions and planning. Furthermore, FMs can augment data based on the understanding of driving scenarios to provide feasible scenes of those rare occurrences in the long tail distribution that are unlikely to be encountered during routine driving and data collection. The enhancement can subsequently lead to improvement in the accuracy and reliability of autonomous driving systems. Another testament to the potential of FMs' applications lies in World Models, exemplified by the DREAMER series, which showcases the ability to comprehend physical laws and dynamics. Learning from massive data under the paradigm of self-supervised learning, World Model can generate unseen yet plausible driving environments, facilitating the enhancement in the prediction of road users' behaviors and the off-line training of driving strategies. In this paper, we synthesize the applications and future trends of FMs in autonomous driving. By utilizing the powerful capabilities of FMs, we strive to tackle the potential issues stemming from the long-tail distribution in autonomous driving, consequently advancing overall safety in this domain.
Related papers
- CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving [1.727597257312416]
CoVLA (Comprehensive Vision-Language-Action) dataset comprises real-world driving videos spanning more than 80 hours.
This dataset establishes a framework for robust, interpretable, and data-driven autonomous driving systems.
arXiv Detail & Related papers (2024-08-19T09:53:49Z) - RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model [22.25903116720301]
explainability plays a critical role in trustworthy autonomous decision-making.
Recent advancements in Multi-Modal Large Language models (MLLMs) have shown promising potential in enhancing the explainability as a driving agent.
We present RAG-Driver, a novel retrieval-augmented multi-modal large language model that leverages in-context learning for high-performance, explainable, and generalisable autonomous driving.
arXiv Detail & Related papers (2024-02-16T16:57:18Z) - Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives [56.2139730920855]
We present a systematic analysis of MM-VUFMs specifically designed for road scenes.
Our objective is to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques.
We provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models.
arXiv Detail & Related papers (2024-02-05T12:47:09Z) - Forging Vision Foundation Models for Autonomous Driving: Challenges,
Methodologies, and Opportunities [59.02391344178202]
Vision foundation models (VFMs) serve as potent building blocks for a wide range of AI applications.
The scarcity of comprehensive training data, the need for multi-sensor integration, and the diverse task-specific architectures pose significant obstacles to the development of VFMs.
This paper delves into the critical challenge of forging VFMs tailored specifically for autonomous driving, while also outlining future directions.
arXiv Detail & Related papers (2024-01-16T01:57:24Z) - DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral
Planning States for Autonomous Driving [69.82743399946371]
DriveMLM is a framework that can perform close-loop autonomous driving in realistic simulators.
We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system.
This model can plug-and-play in existing AD systems such as Apollo for close-loop driving.
arXiv Detail & Related papers (2023-12-14T18:59:05Z) - LLM4Drive: A Survey of Large Language Models for Autonomous Driving [62.10344445241105]
Large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers.
In this paper, we systematically review a research line about textitLarge Language Models for Autonomous Driving (LLM4AD).
arXiv Detail & Related papers (2023-11-02T07:23:33Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large
Language Models [30.23228092898916]
We propose the DiLu framework, which combines a Reasoning and a Reflection module to enable the system to perform decision-making based on common-sense knowledge.
Extensive experiments prove DiLu's capability to accumulate experience and demonstrate a significant advantage in generalization ability.
To the best of our knowledge, we are the first to leverage knowledge-driven capability in decision-making for autonomous vehicles.
arXiv Detail & Related papers (2023-09-28T09:41:35Z) - KARNet: Kalman Filter Augmented Recurrent Neural Network for Learning
World Models in Autonomous Driving Tasks [11.489187712465325]
We present a Kalman filter augmented recurrent neural network architecture to learn the latent representation of the traffic flow using front camera images only.
Results show that incorporating an explicit model of the vehicle (states estimated using Kalman filtering) in the end-to-end learning significantly increases performance.
arXiv Detail & Related papers (2023-05-24T02:27:34Z) - Tackling Real-World Autonomous Driving using Deep Reinforcement Learning [63.3756530844707]
In this work, we propose a model-free Deep Reinforcement Learning Planner training a neural network that predicts acceleration and steering angle.
In order to deploy the system on board the real self-driving car, we also develop a module represented by a tiny neural network.
arXiv Detail & Related papers (2022-07-05T16:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.