Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation
- URL: http://arxiv.org/abs/2502.06843v1
- Date: Thu, 06 Feb 2025 19:19:28 GMT
- Title: Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation
- Authors: Namhee Kim, Woojin Park,
- Abstract summary: This study introduces a Large Language Model (LLM)-based Autonomous Driving (AD) assistance system.
The vision adapter, combining YOLOv4 and Vision Transformer (ViT), extracts comprehensive visual features.
The system closely mirrors human performance in describing situations and moderately aligns with human decisions in generating appropriate responses.
- Score: 2.322929119892535
- License:
- Abstract: Traditional autonomous driving systems often struggle with reasoning in complex, unexpected scenarios due to limited comprehension of spatial relationships. In response, this study introduces a Large Language Model (LLM)-based Autonomous Driving (AD) assistance system that integrates a vision adapter and an LLM reasoning module to enhance visual understanding and decision-making. The vision adapter, combining YOLOv4 and Vision Transformer (ViT), extracts comprehensive visual features, while GPT-4 enables human-like spatial reasoning and response generation. Experimental evaluations with 45 experienced drivers revealed that the system closely mirrors human performance in describing situations and moderately aligns with human decisions in generating appropriate responses.
Related papers
- Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives [56.528835143531694]
We introduce DriveBench, a benchmark dataset designed to evaluate Vision-Language Models (VLMs)
Our findings reveal that VLMs often generate plausible responses derived from general knowledge or textual cues rather than true visual grounding.
We propose refined evaluation metrics that prioritize robust visual grounding and multi-modal understanding.
arXiv Detail & Related papers (2025-01-07T18:59:55Z) - Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Experiments, and Challenges [15.52530518623987]
Large Language Models (LLMs) have the potential to enhance various aspects of autonomous driving systems.
In this paper, we first introduce the novel concept of designing LLMs for autonomous driving (LLM4AD)
We conduct a series of experiments on real-world vehicle platforms, thoroughly evaluating the performance and potential of our LLM4AD systems.
arXiv Detail & Related papers (2024-10-20T04:36:19Z) - Probing Multimodal LLMs as World Models for Driving [72.18727651074563]
We look at the application of Multimodal Large Language Models (MLLMs) in autonomous driving.
Despite advances in models like GPT-4o, their performance in complex driving environments remains largely unexplored.
arXiv Detail & Related papers (2024-05-09T17:52:42Z) - VLP: Vision Language Planning for Autonomous Driving [52.640371249017335]
This paper presents a novel Vision-Language-Planning framework that exploits language models to bridge the gap between linguistic understanding and autonomous driving.
It achieves state-of-the-art end-to-end planning performance on the NuScenes dataset by achieving 35.9% and 60.5% reduction in terms of average L2 error and collision rates, respectively.
arXiv Detail & Related papers (2024-01-10T23:00:40Z) - Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems.
LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning.
We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z) - On the Road with GPT-4V(ision): Early Explorations of Visual-Language
Model on Autonomous Driving [37.617793990547625]
This report provides an exhaustive evaluation of the latest state-of-the-art VLM, GPT-4V.
We explore the model's abilities to understand and reason about driving scenes, make decisions, and ultimately act in the capacity of a driver.
Our findings reveal that GPT-4V demonstrates superior performance in scene understanding and causal reasoning compared to existing autonomous systems.
arXiv Detail & Related papers (2023-11-09T12:58:37Z) - Receive, Reason, and React: Drive as You Say with Large Language Models
in Autonomous Vehicles [13.102404404559428]
We propose a novel framework that leverages Large Language Models (LLMs) to enhance the decision-making process in autonomous vehicles.
Our research includes experiments in HighwayEnv, a collection of environments for autonomous driving and tactical decision-making tasks.
We also examine real-time personalization, demonstrating how LLMs can influence driving behaviors based on verbal commands.
arXiv Detail & Related papers (2023-10-12T04:56:01Z) - LanguageMPC: Large Language Models as Decision Makers for Autonomous
Driving [87.1164964709168]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios.
Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z) - DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model [84.29836263441136]
This study introduces DriveGPT4, a novel interpretable end-to-end autonomous driving system based on multimodal large language models (MLLMs)
DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users.
arXiv Detail & Related papers (2023-10-02T17:59:52Z) - Drive Like a Human: Rethinking Autonomous Driving with Large Language
Models [28.957124302293966]
We explore the potential of using a large language model (LLM) to understand the driving environment in a human-like manner.
Our experiments show that the LLM exhibits the impressive ability to reason and solve long-tailed cases.
arXiv Detail & Related papers (2023-07-14T05:18:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.