Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events
- URL: http://arxiv.org/abs/2406.13894v1
- Date: Wed, 19 Jun 2024 23:50:41 GMT
- Title: Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events
- Authors: Mohammad Abu Tami, Huthaifa I. Ashqar, Mohammed Elhenawy,
- Abstract summary: Multimodal Large Language Models (MLLMs) offer a novel approach by integrating textual, visual, and audio modalities.
Our framework leverages the reasoning power of MLLMs, directing their output through context-specific prompts.
Preliminary results demonstrate the framework's potential in zero-shot learning and accurate scenario analysis.
- Score: 5.233512464561313
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional approaches to safety event analysis in autonomous systems have relied on complex machine learning models and extensive datasets for high accuracy and reliability. However, the advent of Multimodal Large Language Models (MLLMs) offers a novel approach by integrating textual, visual, and audio modalities, thereby providing automated analyses of driving videos. Our framework leverages the reasoning power of MLLMs, directing their output through context-specific prompts to ensure accurate, reliable, and actionable insights for hazard detection. By incorporating models like Gemini-Pro-Vision 1.5 and Llava, our methodology aims to automate the safety critical events and mitigate common issues such as hallucinations in MLLM outputs. Preliminary results demonstrate the framework's potential in zero-shot learning and accurate scenario analysis, though further validation on larger datasets is necessary. Furthermore, more investigations are required to explore the performance enhancements of the proposed framework through few-shot learning and fine-tuned models. This research underscores the significance of MLLMs in advancing the analysis of the naturalistic driving videos by improving safety-critical event detecting and understanding the interaction with complex environments.
Related papers
- A Superalignment Framework in Autonomous Driving with Large Language Models [2.650382010271]
Large language models (LLMs) and multi-modal large language models (MLLMs) are extensively used in autonomous driving.
Despite their importance, the security aspect of LLMs in autonomous driving remains underexplored.
This research introduces a novel security framework for autonomous vehicles, utilizing a multi-agent LLM approach.
arXiv Detail & Related papers (2024-06-09T05:26:38Z) - Probing Multimodal LLMs as World Models for Driving [72.18727651074563]
This study focuses on the application of Multimodal Large Language Models (MLLMs) within the domain of autonomous driving.
We evaluate the capability of various MLLMs as world models for driving from the perspective of a fixed in-car camera.
Our results highlight a critical gap in the current capabilities of state-of-the-art MLLMs.
arXiv Detail & Related papers (2024-05-09T17:52:42Z) - Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models [12.687494201105066]
This paper proposes Traj-LLM, the first to investigate the potential of using Large Language Models (LLMs) to generate future motion from agents' past/observed trajectories and scene semantics.
LLMs' powerful comprehension abilities capture a spectrum of high-level scene knowledge and interactive information.
Emulating the human-like lane focus cognitive function, we introduce lane-aware probabilistic learning powered by the pioneering Mamba module.
arXiv Detail & Related papers (2024-05-08T09:28:04Z) - Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security [5.077261736366414]
The pursuit of reliable AI systems like powerful MLLMs has emerged as a pivotal area of contemporary research.
In this paper, we endeavor to demostrate the multifaceted risks associated with the incorporation of image modalities into MLLMs.
arXiv Detail & Related papers (2024-04-08T07:54:18Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - Multi-modal Auto-regressive Modeling via Visual Words [96.25078866446053]
We propose the concept of visual words, which maps the visual features to probability distributions over Large Multi-modal Models' vocabulary.
We further explore the distribution of visual features in the semantic space within LMM and the possibility of using text embeddings to represent visual information.
arXiv Detail & Related papers (2024-03-12T14:58:52Z) - Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems.
LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning.
We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.