Related papers: DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

URL: http://arxiv.org/abs/2312.09245v2
Date: Mon, 25 Dec 2023 15:50:52 GMT
Title: DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Authors: Wenhai Wang, Jiangwei Xie, ChuanYang Hu, Haoming Zou, Jianan Fan, Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, Hao Tian, Lewei Lu, Xizhou Zhu, Xiaogang Wang, Yu Qiao, Jifeng Dai
Abstract summary: DriveMLM is a framework that can perform close-loop autonomous driving in realistic simulators. We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system. This model can plug-and-play in existing AD systems such as Apollo for close-loop driving.
Score: 69.82743399946371
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have opened up new possibilities for intelligent agents, endowing them with human-like thinking and cognitive abilities. In this work, we delve into the potential of large language models (LLMs) in autonomous driving (AD). We introduce DriveMLM, an LLM-based AD framework that can perform close-loop autonomous driving in realistic simulators. To this end, (1) we bridge the gap between the language decisions and the vehicle control commands by standardizing the decision states according to the off-the-shelf motion planning module. (2) We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system, which uses driving rules, user commands, and inputs from various sensors (e.g., camera, lidar) as input and makes driving decisions and provide explanations; This model can plug-and-play in existing AD systems such as Apollo for close-loop driving. (3) We design an effective data engine to collect a dataset that includes decision state and corresponding explanation annotation for model training and evaluation. We conduct extensive experiments and show that our model achieves 76.1 driving score on the CARLA Town05 Long, and surpasses the Apollo baseline by 4.7 points under the same settings, demonstrating the effectiveness of our model. We hope this work can serve as a baseline for autonomous driving with LLMs. Code and models shall be released at https://github.com/OpenGVLab/DriveMLM.

Related papers

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving [63.882827922267666]
DriveMM is a large multimodal model designed to process diverse data inputs, such as images and multi-view videos, while performing a broad spectrum of autonomous driving tasks. We conduct evaluations on six public benchmarks and undertake zero-shot transfer on an unseen dataset, where DriveMM achieves state-of-the-art performance across all tasks.
arXiv Detail & Related papers (2024-12-10T17:27:32Z)
How to Build a Pre-trained Multimodal model for Simultaneously Chatting and Decision-making? [14.599617146656335]
We develop a new model architecture termed Visual Language Action model for Chatting and Decision Making (VLA4CD) We leverage LoRA to fine-tune a pre-trained LLM with data of multiple modalities covering language, visual, and action. These designs enable VLA4CD to provide continuous-valued action decisions while outputting text responses.
arXiv Detail & Related papers (2024-10-21T11:02:42Z)
Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment [15.52530518623987]
Large Language Models (LLMs) have the potential to enhance various aspects of autonomous driving systems. This paper introduces novel concepts and approaches to designing LLMs for autonomous driving (LLM4AD)
arXiv Detail & Related papers (2024-10-20T04:36:19Z)
Probing Multimodal LLMs as World Models for Driving [72.18727651074563]
We look at the application of Multimodal Large Language Models (MLLMs) in autonomous driving. Despite advances in models like GPT-4o, their performance in complex driving environments remains largely unexplored.
arXiv Detail & Related papers (2024-05-09T17:52:42Z)
Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving [0.0]
We develop an efficient, lightweight, multi-frame vision language model which performs Visual Question Answering for autonomous driving. In comparison to previous approaches, EM-VLM4AD requires at least 10 times less memory and floating point operations.
arXiv Detail & Related papers (2024-03-28T21:18:33Z)
DriveLM: Driving with Graph Visual Question Answering [57.51930417790141]
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems. We propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.
arXiv Detail & Related papers (2023-12-21T18:59:12Z)
LMDrive: Closed-Loop End-to-End Driving with Large Language Models [37.910449013471656]
Large language models (LLM) have shown impressive reasoning capabilities that approach "Artificial General Intelligence" This paper introduces LMDrive, a novel language-guided, end-to-end, closed-loop autonomous driving framework.
arXiv Detail & Related papers (2023-12-12T18:24:15Z)
Evaluation of Large Language Models for Decision Making in Autonomous Driving [4.271294502084542]
One strategy of using Large Language Models (LLMs) for autonomous driving involves inputting surrounding objects as text prompts to the LLMs. When using LLMs for such purposes, capabilities such as spatial recognition and planning are essential. This study quantitatively evaluated these two abilities of LLMs in the context of autonomous driving.
arXiv Detail & Related papers (2023-12-11T12:56:40Z)
LLM4Drive: A Survey of Large Language Models for Autonomous Driving [62.10344445241105]
Large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers. In this paper, we systematically review a research line about textitLarge Language Models for Autonomous Driving (LLM4AD).
arXiv Detail & Related papers (2023-11-02T07:23:33Z)
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving [87.1164964709168]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios. Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z)
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving [6.728693243652425]
Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique object-level multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in driving situations.
arXiv Detail & Related papers (2023-10-03T11:05:14Z)
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model [84.29836263441136]
This study introduces DriveGPT4, a novel interpretable end-to-end autonomous driving system based on multimodal large language models (MLLMs) DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users.
arXiv Detail & Related papers (2023-10-02T17:59:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.