A Survey on Multimodal Large Language Models for Autonomous Driving
- URL: http://arxiv.org/abs/2311.12320v1
- Date: Tue, 21 Nov 2023 03:32:01 GMT
- Title: A Survey on Multimodal Large Language Models for Autonomous Driving
- Authors: Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Yang Zhou, Kaizhao Liang,
Jintai Chen, Juanwu Lu, Zichong Yang, Kuei-Da Liao, Tianren Gao, Erlong Li,
Kun Tang, Zhipeng Cao, Tong Zhou, Ao Liu, Xinrui Yan, Shuqi Mei, Jianguo Cao,
Ziran Wang, Chao Zheng
- Abstract summary: Multimodal AI systems benefiting from large models have the potential to equally perceive the real world, make decisions, and control tools as humans.
Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors to apply in Multimodal Large Language Models driving systems.
- Score: 31.614730391949657
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With the emergence of Large Language Models (LLMs) and Vision Foundation
Models (VFMs), multimodal AI systems benefiting from large models have the
potential to equally perceive the real world, make decisions, and control tools
as humans. In recent months, LLMs have shown widespread attention in autonomous
driving and map systems. Despite its immense potential, there is still a lack
of a comprehensive understanding of key challenges, opportunities, and future
endeavors to apply in LLM driving systems. In this paper, we present a
systematic investigation in this field. We first introduce the background of
Multimodal Large Language Models (MLLMs), the multimodal models development
using LLMs, and the history of autonomous driving. Then, we overview existing
MLLM tools for driving, transportation, and map systems together with existing
datasets and benchmarks. Moreover, we summarized the works in The 1st WACV
Workshop on Large Language and Vision Models for Autonomous Driving (LLVM-AD),
which is the first workshop of its kind regarding LLMs in autonomous driving.
To further promote the development of this field, we also discuss several
important problems regarding using MLLMs in autonomous driving systems that
need to be solved by both academia and industry.
Related papers
- Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment [15.52530518623987]
Large Language Models (LLMs) have the potential to enhance various aspects of autonomous driving systems.
This paper introduces novel concepts and approaches to designing LLMs for autonomous driving (LLM4AD)
arXiv Detail & Related papers (2024-10-20T04:36:19Z) - A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks [74.52259252807191]
Multimodal Large Language Models (MLLMs) address the complexities of real-world applications far beyond the capabilities of single-modality systems.
This paper systematically sorts out the applications of MLLM in multimodal tasks such as natural language, vision, and audio.
arXiv Detail & Related papers (2024-08-02T15:14:53Z) - LLMs Meet Multimodal Generation and Editing: A Survey [89.76691959033323]
This survey elaborates on multimodal generation and editing across various domains, comprising image, video, 3D, and audio.
We summarize the notable advancements with milestone works in these fields and categorize these studies into LLM-based and CLIP/T5-based methods.
We dig into tool-augmented multimodal agents that can leverage existing generative models for human-computer interaction.
arXiv Detail & Related papers (2024-05-29T17:59:20Z) - Probing Multimodal LLMs as World Models for Driving [72.18727651074563]
We look at the application of Multimodal Large Language Models (MLLMs) in autonomous driving.
Despite advances in models like GPT-4o, their performance in complex driving environments remains largely unexplored.
arXiv Detail & Related papers (2024-05-09T17:52:42Z) - Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving [0.0]
We develop an efficient, lightweight, multi-frame vision language model which performs Visual Question Answering for autonomous driving.
In comparison to previous approaches, EM-VLM4AD requires at least 10 times less memory and floating point operations.
arXiv Detail & Related papers (2024-03-28T21:18:33Z) - Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected
Multi-Modal Large Models [76.99140362751787]
We present NuInstruct, a novel dataset with 91K multi-view video-QA pairs across 17 subtasks.
We also present BEV-InMLLM, an end-to-end method for efficiently deriving instruction-aware Bird's-Eye-View features.
arXiv Detail & Related papers (2024-01-02T01:54:22Z) - DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral
Planning States for Autonomous Driving [69.82743399946371]
DriveMLM is a framework that can perform close-loop autonomous driving in realistic simulators.
We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system.
This model can plug-and-play in existing AD systems such as Apollo for close-loop driving.
arXiv Detail & Related papers (2023-12-14T18:59:05Z) - LLM4Drive: A Survey of Large Language Models for Autonomous Driving [62.10344445241105]
Large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers.
In this paper, we systematically review a research line about textitLarge Language Models for Autonomous Driving (LLM4AD).
arXiv Detail & Related papers (2023-11-02T07:23:33Z) - Vision Language Models in Autonomous Driving: A Survey and Outlook [26.70381732289961]
Vision-Language Models (VLMs) have attracted widespread attention due to their outstanding performance and the ability to leverage Large Language Models (LLMs)
We present a comprehensive and systematic survey of the advances in vision language models in this domain, encompassing perception and understanding, navigation and planning, decision-making and control, end-to-end autonomous driving, and data generation.
arXiv Detail & Related papers (2023-10-22T21:06:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.