Application of Multimodal Large Language Models in Autonomous Driving
- URL: http://arxiv.org/abs/2412.16410v1
- Date: Sat, 21 Dec 2024 00:09:52 GMT
- Title: Application of Multimodal Large Language Models in Autonomous Driving
- Authors: Md Robiul Islam,
- Abstract summary: We conduct in-depth study on implementing the Multi-modal Large Language Model.
We address problems with the poor performance of MLLM on Autonomous Driving.
We then break down the AD decision-making process by scene understanding, prediction, and decision-making.
- Score: 1.8181868280594944
- License:
- Abstract: In this era of technological advancements, several cutting-edge techniques are being implemented to enhance Autonomous Driving (AD) systems, focusing on improving safety, efficiency, and adaptability in complex driving environments. However, AD still faces some problems including performance limitations. To address this problem, we conducted an in-depth study on implementing the Multi-modal Large Language Model. We constructed a Virtual Question Answering (VQA) dataset to fine-tune the model and address problems with the poor performance of MLLM on AD. We then break down the AD decision-making process by scene understanding, prediction, and decision-making. Chain of Thought has been used to make the decision more perfectly. Our experiments and detailed analysis of Autonomous Driving give an idea of how important MLLM is for AD.
Related papers
- Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - A Survey on Large Language Model-empowered Autonomous Driving [25.963195890376646]
Development of autonomous driving (AD) technology follows two main technical paths: modularization and end-to-end.
This paper conducts a thorough analysis of the potential applications of large language models (LLMs) in AD systems.
We discuss an important question: Can LLM-based artificial general intelligence (AGI) be a key to achieve high-level AD?
arXiv Detail & Related papers (2024-09-21T15:07:37Z) - Large Language Models for Human-like Autonomous Driving: A Survey [7.125039718268125]
Large Language Models (LLMs) are AI models trained on massive text corpora with remarkable language understanding and generation capabilities.
This survey provides a review of progress in leveraging LLMs for Autonomous Driving.
It focuses on their applications in modular AD pipelines and end-to-end AD systems.
arXiv Detail & Related papers (2024-07-27T15:24:11Z) - RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model [22.25903116720301]
explainability plays a critical role in trustworthy autonomous decision-making.
Recent advancements in Multi-Modal Large Language models (MLLMs) have shown promising potential in enhancing the explainability as a driving agent.
We present RAG-Driver, a novel retrieval-augmented multi-modal large language model that leverages in-context learning for high-performance, explainable, and generalisable autonomous driving.
arXiv Detail & Related papers (2024-02-16T16:57:18Z) - DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral
Planning States for Autonomous Driving [69.82743399946371]
DriveMLM is a framework that can perform close-loop autonomous driving in realistic simulators.
We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system.
This model can plug-and-play in existing AD systems such as Apollo for close-loop driving.
arXiv Detail & Related papers (2023-12-14T18:59:05Z) - Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems.
LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning.
We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z) - LLM4Drive: A Survey of Large Language Models for Autonomous Driving [62.10344445241105]
Large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers.
In this paper, we systematically review a research line about textitLarge Language Models for Autonomous Driving (LLM4AD).
arXiv Detail & Related papers (2023-11-02T07:23:33Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - LanguageMPC: Large Language Models as Decision Makers for Autonomous
Driving [87.1164964709168]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios.
Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z) - Drive Like a Human: Rethinking Autonomous Driving with Large Language
Models [28.957124302293966]
We explore the potential of using a large language model (LLM) to understand the driving environment in a human-like manner.
Our experiments show that the LLM exhibits the impressive ability to reason and solve long-tailed cases.
arXiv Detail & Related papers (2023-07-14T05:18:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.