SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
- URL: http://arxiv.org/abs/2503.00211v1
- Date: Fri, 28 Feb 2025 21:53:47 GMT
- Title: SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
- Authors: Jiawei Zhang, Xuan Yang, Taiqi Wang, Yu Yao, Aleksandr Petiushko, Bo Li,
- Abstract summary: Multimodal Large Language Models (MLLMs) can process both visual and textual data.<n>We propose SafeAuto, a novel framework that enhances MLLM-based autonomous driving systems by incorporating both unstructured and structured knowledge.
- Score: 63.71984266104757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional autonomous driving systems often struggle to integrate high-level reasoning with low-level control, resulting in suboptimal and sometimes unsafe driving behaviors. The emergence of Multimodal Large Language Models (MLLMs), which can process both visual and textual data, presents an opportunity to unify perception and reasoning tasks within a single framework. However, effectively embedding precise safety knowledge into MLLMs for autonomous driving remains a significant challenge. To address this, we propose SafeAuto, a novel framework that enhances MLLM-based autonomous driving systems by incorporating both unstructured and structured knowledge. Specifically, we first introduce the Position-Dependent Cross-Entropy (PDCE) loss function, designed to improve the accuracy of low-level control signal predictions when numerical values are represented as text. Second, to ensure safe autonomous driving by explicitly integrating precise safety knowledge into the MLLM, we develop a reasoning component for SafeAuto. This component translates driving safety regulations into first-order logic rules (e.g., "red light => stop") and incorporates these rules into a probabilistic graphical model, such as a Markov Logic Network (MLN). The MLN is trained to verify the predicted next actions using environmental attributes identified by attribute recognition models (e.g., detecting a red light) to form the predicates. Additionally, we construct a Multimodal RAG model that leverages video data, control signals, and environmental attributes to learn more effectively from past similar driving experiences. By integrating PDCE, MLN, and Multimodal RAG, SafeAuto significantly outperforms existing baselines across multiple datasets. This advancement enables more accurate, reliable, and safer autonomous driving systems that learn from experience, obey traffic laws, and perform precise control actions.
Related papers
- LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving [9.447298958886265]
Vision-Language Models (VLMs) have demonstrated significant potential for end-to-end autonomous driving.
We introduce LightEMMA, a Lightweight End-to-End Multimodal Model for Autonomous driving.
We construct twelve autonomous driving agents using various VLMs and evaluate their performance on the nuScenes prediction task.
arXiv Detail & Related papers (2025-05-01T04:12:41Z) - SafeDrive: Knowledge- and Data-Driven Risk-Sensitive Decision-Making for Autonomous Vehicles with Large Language Models [14.790308656087316]
SafeDrive is a knowledge- and data-driven risk-sensitive decision-making framework to enhance autonomous driving safety and adaptability.<n>By integrating knowledge-driven insights with adaptive learning mechanisms, the framework ensures robust decision-making under uncertain conditions.
arXiv Detail & Related papers (2024-12-17T16:45:27Z) - Generating Out-Of-Distribution Scenarios Using Language Models [58.47597351184034]
Large Language Models (LLMs) have shown promise in autonomous driving.
This paper introduces a framework for generating diverse Out-Of-Distribution (OOD) driving scenarios.
We evaluate our framework through extensive simulations and introduce a new "OOD-ness" metric.
arXiv Detail & Related papers (2024-11-25T16:38:17Z) - Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events [5.233512464561313]
Multimodal Large Language Models (MLLMs) offer a novel approach by integrating textual, visual, and audio modalities.
Our framework leverages the reasoning power of MLLMs, directing their output through context-specific prompts.
Preliminary results demonstrate the framework's potential in zero-shot learning and accurate scenario analysis.
arXiv Detail & Related papers (2024-06-19T23:50:41Z) - A Superalignment Framework in Autonomous Driving with Large Language Models [2.650382010271]
Large language models (LLMs) and multi-modal large language models (MLLMs) are extensively used in autonomous driving.
Despite their importance, the security aspect of LLMs in autonomous driving remains underexplored.
This research introduces a novel security framework for autonomous vehicles, utilizing a multi-agent LLM approach.
arXiv Detail & Related papers (2024-06-09T05:26:38Z) - DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in
Autonomous Driving [65.04871316921327]
This paper introduces a new autonomous driving system that enhances the performance and reliability of autonomous driving system.
DME-Driver utilizes a powerful vision language model as the decision-maker and a planning-oriented perception model as the control signal generator.
By leveraging this dataset, our model achieves high-precision planning accuracy through a logical thinking process.
arXiv Detail & Related papers (2024-01-08T03:06:02Z) - Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems.
LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning.
We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z) - LLM4Drive: A Survey of Large Language Models for Autonomous Driving [62.10344445241105]
Large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers.
In this paper, we systematically review a research line about textitLarge Language Models for Autonomous Driving (LLM4AD).
arXiv Detail & Related papers (2023-11-02T07:23:33Z) - End-to-End Intersection Handling using Multi-Agent Deep Reinforcement
Learning [63.56464608571663]
Navigating through intersections is one of the main challenging tasks for an autonomous vehicle.
In this work, we focus on the implementation of a system able to navigate through intersections where only traffic signs are provided.
We propose a multi-agent system using a continuous, model-free Deep Reinforcement Learning algorithm used to train a neural network for predicting both the acceleration and the steering angle at each time step.
arXiv Detail & Related papers (2021-04-28T07:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.