Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events
- URL: http://arxiv.org/abs/2406.13894v1
- Date: Wed, 19 Jun 2024 23:50:41 GMT
- Title: Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events
- Authors: Mohammad Abu Tami, Huthaifa I. Ashqar, Mohammed Elhenawy,
- Abstract summary: Multimodal Large Language Models (MLLMs) offer a novel approach by integrating textual, visual, and audio modalities.
Our framework leverages the reasoning power of MLLMs, directing their output through context-specific prompts.
Preliminary results demonstrate the framework's potential in zero-shot learning and accurate scenario analysis.
- Score: 5.233512464561313
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional approaches to safety event analysis in autonomous systems have relied on complex machine learning models and extensive datasets for high accuracy and reliability. However, the advent of Multimodal Large Language Models (MLLMs) offers a novel approach by integrating textual, visual, and audio modalities, thereby providing automated analyses of driving videos. Our framework leverages the reasoning power of MLLMs, directing their output through context-specific prompts to ensure accurate, reliable, and actionable insights for hazard detection. By incorporating models like Gemini-Pro-Vision 1.5 and Llava, our methodology aims to automate the safety critical events and mitigate common issues such as hallucinations in MLLM outputs. Preliminary results demonstrate the framework's potential in zero-shot learning and accurate scenario analysis, though further validation on larger datasets is necessary. Furthermore, more investigations are required to explore the performance enhancements of the proposed framework through few-shot learning and fine-tuned models. This research underscores the significance of MLLMs in advancing the analysis of the naturalistic driving videos by improving safety-critical event detecting and understanding the interaction with complex environments.
Related papers
- RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks [74.52259252807191]
Multimodal Large Language Models (MLLMs) address the complexities of real-world applications far beyond the capabilities of single-modality systems.
This paper systematically sorts out the applications of MLLM in multimodal tasks such as natural language, vision, and audio.
arXiv Detail & Related papers (2024-08-02T15:14:53Z) - A Superalignment Framework in Autonomous Driving with Large Language Models [2.650382010271]
Large language models (LLMs) and multi-modal large language models (MLLMs) are extensively used in autonomous driving.
Despite their importance, the security aspect of LLMs in autonomous driving remains underexplored.
This research introduces a novel security framework for autonomous vehicles, utilizing a multi-agent LLM approach.
arXiv Detail & Related papers (2024-06-09T05:26:38Z) - Probing Multimodal LLMs as World Models for Driving [72.18727651074563]
We look at the application of Multimodal Large Language Models (MLLMs) in autonomous driving.
Despite advances in models like GPT-4o, their performance in complex driving environments remains largely unexplored.
arXiv Detail & Related papers (2024-05-09T17:52:42Z) - Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models [12.687494201105066]
This paper proposes Traj-LLM, the first to investigate the potential of using Large Language Models (LLMs) to generate future motion from agents' past/observed trajectories and scene semantics.
LLMs' powerful comprehension abilities capture a spectrum of high-level scene knowledge and interactive information.
Emulating the human-like lane focus cognitive function, we introduce lane-aware probabilistic learning powered by the pioneering Mamba module.
arXiv Detail & Related papers (2024-05-08T09:28:04Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - Multi-modal Auto-regressive Modeling via Visual Words [96.25078866446053]
We propose the concept of visual tokens, which maps the visual features to probability distributions over Large Multi-modal Models' vocabulary.
We further explore the distribution of visual features in the semantic space within LMM and the possibility of using text embeddings to represent visual information.
arXiv Detail & Related papers (2024-03-12T14:58:52Z) - Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems.
LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning.
We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.