A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving
- URL: http://arxiv.org/abs/2503.11400v1
- Date: Fri, 14 Mar 2025 13:43:26 GMT
- Title: A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving
- Authors: Tin Stribor Sohn, Philipp Reis, Maximilian Dillitzer, Johannes Bach, Jason J. Corso, Eric Sax,
- Abstract summary: Multimodal large language models (MLLMs) hold the potential to enhance autonomous driving.<n>Their integration into autonomous driving systems shows promising results in isolated proof-of-concept applications.<n>This paper proposes a holistic framework for a capability-driven evaluation of MLLMs in autonomous driving.
- Score: 15.24721920935653
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal large language models (MLLMs) hold the potential to enhance autonomous driving by combining domain-independent world knowledge with context-specific language guidance. Their integration into autonomous driving systems shows promising results in isolated proof-of-concept applications, while their performance is evaluated on selective singular aspects of perception, reasoning, or planning. To leverage their full potential a systematic framework for evaluating MLLMs in the context of autonomous driving is required. This paper proposes a holistic framework for a capability-driven evaluation of MLLMs in autonomous driving. The framework structures scenario understanding along the four core capability dimensions semantic, spatial, temporal, and physical. They are derived from the general requirements of autonomous driving systems, human driver cognition, and language-based reasoning. It further organises the domain into context layers, processing modalities, and downstream tasks such as language-based interaction and decision-making. To illustrate the framework's applicability, two exemplary traffic scenarios are analysed, grounding the proposed dimensions in realistic driving situations. The framework provides a foundation for the structured evaluation of MLLMs' potential for scenario understanding in autonomous driving.
Related papers
- A Unified Perception-Language-Action Framework for Adaptive Autonomous Driving [10.685706490545956]
We propose a unified Perception-Language-Action (PLA) framework that integrates multi-sensor fusion (cameras, LiDAR, radar) with a large language model (LLM)-augmented Vision-Language-Action (VLA) architecture.<n>This framework unifies low-level sensory processing with high-level contextual reasoning to enable context-aware, explainable, and safety-bounded autonomous driving.
arXiv Detail & Related papers (2025-07-31T13:30:47Z) - STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving [16.602141801221364]
STSBench is a framework to benchmark holistic understanding of vision-language models (VLMs) for autonomous driving.<n>The benchmark features 43 diverse scenarios spanning multiple views, resulting in 971 human-verified multiple-choice questions.<n>A thorough evaluation uncovers shortcomings in existing models' ability to reason about fundamental traffic dynamics in complex environments.
arXiv Detail & Related papers (2025-06-06T16:25:22Z) - SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
Multimodal Large Language Models (MLLMs) can process both visual and textual data.
We propose SafeAuto, a novel framework that enhances MLLM-based autonomous driving systems by incorporating both unstructured and structured knowledge.
arXiv Detail & Related papers (2025-02-28T21:53:47Z) - PKRD-CoT: A Unified Chain-of-thought Prompting for Multi-Modal Large Language Models in Autonomous Driving [8.971981009717284]
This study explores a seamless integration of MLLMs into autonomous driving systems by proposing a Zero-Shot Chain-of-Thought prompt design named PKRD-CoT.
Our design enables MLLMs to tackle problems without prior experience, thereby increasing their utility within unstructured autonomous driving environments.
arXiv Detail & Related papers (2024-12-02T23:08:38Z) - Generating Out-Of-Distribution Scenarios Using Language Models [58.47597351184034]
Large Language Models (LLMs) have shown promise in autonomous driving.
This paper introduces a framework for generating diverse Out-Of-Distribution (OOD) driving scenarios.
We evaluate our framework through extensive simulations and introduce a new "OOD-ness" metric.
arXiv Detail & Related papers (2024-11-25T16:38:17Z) - Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Experiments, and Challenges [15.52530518623987]
Large Language Models (LLMs) have the potential to enhance various aspects of autonomous driving systems.<n>In this paper, we first introduce the novel concept of designing LLMs for autonomous driving (LLM4AD)<n>We conduct a series of experiments on real-world vehicle platforms, thoroughly evaluating the performance and potential of our LLM4AD systems.
arXiv Detail & Related papers (2024-10-20T04:36:19Z) - Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases [102.05741859030951]
We propose CODA-LM, the first benchmark for the automatic evaluation of LVLMs for self-driving corner cases.<n>We show that using the text-only large language models as judges reveals even better alignment with human preferences than the LVLM judges.<n>Our CODA-VLM performs comparably with GPT-4V, even surpassing GPT-4V by +21.42% on the regional perception task.
arXiv Detail & Related papers (2024-04-16T14:20:55Z) - Hybrid Reasoning Based on Large Language Models for Autonomous Car Driving [14.64475022650084]
Large Language Models (LLMs) have garnered significant attention for their ability to understand text and images, generate human-like text, and perform complex reasoning tasks.
We investigate how well LLMs can adapt and apply a combination of arithmetic and common-sense reasoning, particularly in autonomous driving scenarios.
arXiv Detail & Related papers (2024-02-21T08:09:05Z) - Evaluation of Large Language Models for Decision Making in Autonomous
Driving [4.271294502084542]
One strategy of using Large Language Models (LLMs) for autonomous driving involves inputting surrounding objects as text prompts to the LLMs.
When using LLMs for such purposes, capabilities such as spatial recognition and planning are essential.
This study quantitatively evaluated these two abilities of LLMs in the context of autonomous driving.
arXiv Detail & Related papers (2023-12-11T12:56:40Z) - Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving [38.28159034562901]
Reason2Drive is a benchmark dataset with over 600K video-text pairs.
We characterize the autonomous driving process as a sequential combination of perception, prediction, and reasoning steps.
We introduce a novel aggregated evaluation metric to assess chain-based reasoning performance in autonomous systems.
arXiv Detail & Related papers (2023-12-06T18:32:33Z) - Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems.
LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning.
We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z) - LLM4Drive: A Survey of Large Language Models for Autonomous Driving [62.10344445241105]
Large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers.
In this paper, we systematically review a research line about textitLarge Language Models for Autonomous Driving (LLM4AD).
arXiv Detail & Related papers (2023-11-02T07:23:33Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - Receive, Reason, and React: Drive as You Say with Large Language Models
in Autonomous Vehicles [13.102404404559428]
We propose a novel framework that leverages Large Language Models (LLMs) to enhance the decision-making process in autonomous vehicles.
Our research includes experiments in HighwayEnv, a collection of environments for autonomous driving and tactical decision-making tasks.
We also examine real-time personalization, demonstrating how LLMs can influence driving behaviors based on verbal commands.
arXiv Detail & Related papers (2023-10-12T04:56:01Z) - LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving [84.31119464141631]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios.
Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z) - DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model [84.29836263441136]
This study introduces DriveGPT4, a novel interpretable end-to-end autonomous driving system based on multimodal large language models (MLLMs)
DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users.
arXiv Detail & Related papers (2023-10-02T17:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.