Related papers: Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events

Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events

URL: http://arxiv.org/abs/2406.13894v1
Date: Wed, 19 Jun 2024 23:50:41 GMT
Title: Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events
Authors: Mohammad Abu Tami, Huthaifa I. Ashqar, Mohammed Elhenawy,
Abstract summary: Multimodal Large Language Models (MLLMs) offer a novel approach by integrating textual, visual, and audio modalities. Our framework leverages the reasoning power of MLLMs, directing their output through context-specific prompts. Preliminary results demonstrate the framework's potential in zero-shot learning and accurate scenario analysis.
Score: 5.233512464561313
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Traditional approaches to safety event analysis in autonomous systems have relied on complex machine learning models and extensive datasets for high accuracy and reliability. However, the advent of Multimodal Large Language Models (MLLMs) offers a novel approach by integrating textual, visual, and audio modalities, thereby providing automated analyses of driving videos. Our framework leverages the reasoning power of MLLMs, directing their output through context-specific prompts to ensure accurate, reliable, and actionable insights for hazard detection. By incorporating models like Gemini-Pro-Vision 1.5 and Llava, our methodology aims to automate the safety critical events and mitigate common issues such as hallucinations in MLLM outputs. Preliminary results demonstrate the framework's potential in zero-shot learning and accurate scenario analysis, though further validation on larger datasets is necessary. Furthermore, more investigations are required to explore the performance enhancements of the proposed framework through few-shot learning and fine-tuned models. This research underscores the significance of MLLMs in advancing the analysis of the naturalistic driving videos by improving safety-critical event detecting and understanding the interaction with complex environments.

Related papers

Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends [5.233512464561313]
Traffic safety remains a critical global challenge, with traditional Advanced Driver-Assistance Systems often struggling in dynamic real-world scenarios. This paper reviews the transformative potential of Multimodal Large Language Models (MLLMs) in addressing these limitations. By positioning MLLMs as a cornerstone for next-generation traffic safety systems, this review underscores their potential to revolutionize the field.
arXiv Detail & Related papers (2025-04-21T18:48:35Z)
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge [0.0]
Large Language Models (LLMs) have revolutionized artificial intelligence, driving advancements in machine translation, summarization, and conversational agents. Recent studies indicate that LLMs remain vulnerable to adversarial attacks designed to elicit biased responses. This work proposes a scalable benchmarking framework to evaluate LLM robustness against adversarial bias elicitation.
arXiv Detail & Related papers (2025-04-10T16:00:59Z)
VACT: A Video Automatic Causal Testing System and a Benchmark [55.53300306960048]
VACT is an **automated** framework for modeling, evaluating, and measuring the causal understanding of VGMs in real-world scenarios. We introduce multi-level causal evaluation metrics to provide a detailed analysis of the causal performance of VGMs.
arXiv Detail & Related papers (2025-03-08T10:54:42Z)
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
Multimodal Large Language Models (MLLMs) can process both visual and textual data. We propose SafeAuto, a novel framework that enhances MLLM-based autonomous driving systems by incorporating both unstructured and structured knowledge.
arXiv Detail & Related papers (2025-02-28T21:53:47Z)
When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis [6.213279061986497]
SeeUnsafe is a framework that transforms video-based traffic accident analysis into a more interactive, conversational approach. Our framework employs a multimodal-based aggregation strategy to handle videos of various lengths and generate structured responses for review and evaluation. We conduct extensive experiments on the Toyota Woven Traffic Safety dataset, demonstrating that SeeUnsafe effectively performs accident-aware video classification and visual grounding.
arXiv Detail & Related papers (2025-01-17T23:35:34Z)
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z)
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks [74.52259252807191]
Multimodal Large Language Models (MLLMs) address the complexities of real-world applications far beyond the capabilities of single-modality systems. This paper systematically sorts out the applications of MLLM in multimodal tasks such as natural language, vision, and audio.
arXiv Detail & Related papers (2024-08-02T15:14:53Z)
A Superalignment Framework in Autonomous Driving with Large Language Models [2.650382010271]
Large language models (LLMs) and multi-modal large language models (MLLMs) are extensively used in autonomous driving. Despite their importance, the security aspect of LLMs in autonomous driving remains underexplored. This research introduces a novel security framework for autonomous vehicles, utilizing a multi-agent LLM approach.
arXiv Detail & Related papers (2024-06-09T05:26:38Z)
Probing Multimodal LLMs as World Models for Driving [72.18727651074563]
We look at the application of Multimodal Large Language Models (MLLMs) in autonomous driving. Despite advances in models like GPT-4o, their performance in complex driving environments remains largely unexplored.
arXiv Detail & Related papers (2024-05-09T17:52:42Z)
Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models [12.687494201105066]
This paper proposes Traj-LLM, the first to investigate the potential of using Large Language Models (LLMs) to generate future motion from agents' past/observed trajectories and scene semantics. LLMs' powerful comprehension abilities capture a spectrum of high-level scene knowledge and interactive information. Emulating the human-like lane focus cognitive function, we introduce lane-aware probabilistic learning powered by the pioneering Mamba module.
arXiv Detail & Related papers (2024-05-08T09:28:04Z)
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios. We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z)
Multi-modal Auto-regressive Modeling via Visual Words [96.25078866446053]
We propose the concept of visual tokens, which maps the visual features to probability distributions over Large Multi-modal Models' vocabulary. We further explore the distribution of visual features in the semantic space within LMM and the possibility of using text embeddings to represent visual information.
arXiv Detail & Related papers (2024-03-12T14:58:52Z)
Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems. LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning. We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.