DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral
Planning States for Autonomous Driving
- URL: http://arxiv.org/abs/2312.09245v2
- Date: Mon, 25 Dec 2023 15:50:52 GMT
- Title: DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral
Planning States for Autonomous Driving
- Authors: Wenhai Wang, Jiangwei Xie, ChuanYang Hu, Haoming Zou, Jianan Fan,
Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, Hao Tian, Lewei Lu,
Xizhou Zhu, Xiaogang Wang, Yu Qiao, Jifeng Dai
- Abstract summary: DriveMLM is a framework that can perform close-loop autonomous driving in realistic simulators.
We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system.
This model can plug-and-play in existing AD systems such as Apollo for close-loop driving.
- Score: 69.82743399946371
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have opened up new possibilities for intelligent
agents, endowing them with human-like thinking and cognitive abilities. In this
work, we delve into the potential of large language models (LLMs) in autonomous
driving (AD). We introduce DriveMLM, an LLM-based AD framework that can perform
close-loop autonomous driving in realistic simulators. To this end, (1) we
bridge the gap between the language decisions and the vehicle control commands
by standardizing the decision states according to the off-the-shelf motion
planning module. (2) We employ a multi-modal LLM (MLLM) to model the behavior
planning module of a module AD system, which uses driving rules, user commands,
and inputs from various sensors (e.g., camera, lidar) as input and makes
driving decisions and provide explanations; This model can plug-and-play in
existing AD systems such as Apollo for close-loop driving. (3) We design an
effective data engine to collect a dataset that includes decision state and
corresponding explanation annotation for model training and evaluation. We
conduct extensive experiments and show that our model achieves 76.1 driving
score on the CARLA Town05 Long, and surpasses the Apollo baseline by 4.7 points
under the same settings, demonstrating the effectiveness of our model. We hope
this work can serve as a baseline for autonomous driving with LLMs. Code and
models shall be released at https://github.com/OpenGVLab/DriveMLM.
Related papers
- How to Build a Pre-trained Multimodal model for Simultaneously Chatting and Decision-making? [14.599617146656335]
We develop a new model architecture termed Visual Language Action model for Chatting and Decision Making (VLA4CD)
We leverage LoRA to fine-tune a pre-trained LLM with data of multiple modalities covering language, visual, and action.
These designs enable VLA4CD to provide continuous-valued action decisions while outputting text responses.
arXiv Detail & Related papers (2024-10-21T11:02:42Z) - Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment [15.52530518623987]
Large Language Models (LLMs) have the potential to enhance various aspects of autonomous driving systems.
This paper introduces novel concepts and approaches to designing LLMs for autonomous driving (LLM4AD)
arXiv Detail & Related papers (2024-10-20T04:36:19Z) - Probing Multimodal LLMs as World Models for Driving [72.18727651074563]
We look at the application of Multimodal Large Language Models (MLLMs) in autonomous driving.
Despite advances in models like GPT-4o, their performance in complex driving environments remains largely unexplored.
arXiv Detail & Related papers (2024-05-09T17:52:42Z) - Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving [0.0]
We develop an efficient, lightweight, multi-frame vision language model which performs Visual Question Answering for autonomous driving.
In comparison to previous approaches, EM-VLM4AD requires at least 10 times less memory and floating point operations.
arXiv Detail & Related papers (2024-03-28T21:18:33Z) - DriveLM: Driving with Graph Visual Question Answering [57.51930417790141]
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems.
We propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.
arXiv Detail & Related papers (2023-12-21T18:59:12Z) - LMDrive: Closed-Loop End-to-End Driving with Large Language Models [37.910449013471656]
Large language models (LLM) have shown impressive reasoning capabilities that approach "Artificial General Intelligence"
This paper introduces LMDrive, a novel language-guided, end-to-end, closed-loop autonomous driving framework.
arXiv Detail & Related papers (2023-12-12T18:24:15Z) - Evaluation of Large Language Models for Decision Making in Autonomous
Driving [4.271294502084542]
One strategy of using Large Language Models (LLMs) for autonomous driving involves inputting surrounding objects as text prompts to the LLMs.
When using LLMs for such purposes, capabilities such as spatial recognition and planning are essential.
This study quantitatively evaluated these two abilities of LLMs in the context of autonomous driving.
arXiv Detail & Related papers (2023-12-11T12:56:40Z) - LLM4Drive: A Survey of Large Language Models for Autonomous Driving [62.10344445241105]
Large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers.
In this paper, we systematically review a research line about textitLarge Language Models for Autonomous Driving (LLM4AD).
arXiv Detail & Related papers (2023-11-02T07:23:33Z) - LanguageMPC: Large Language Models as Decision Makers for Autonomous
Driving [87.1164964709168]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios.
Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z) - DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model [84.29836263441136]
This study introduces DriveGPT4, a novel interpretable end-to-end autonomous driving system based on multimodal large language models (MLLMs)
DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users.
arXiv Detail & Related papers (2023-10-02T17:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.