Related papers: Is it safe to cross? Interpretable Risk Assessment with GPT-4V for Safety-Aware Street Crossing

Is it safe to cross? Interpretable Risk Assessment with GPT-4V for Safety-Aware Street Crossing

URL: http://arxiv.org/abs/2402.06794v2
Date: Sat, 6 Jul 2024 15:36:23 GMT
Title: Is it safe to cross? Interpretable Risk Assessment with GPT-4V for Safety-Aware Street Crossing
Authors: Hochul Hwang, Sunjae Kwon, Yekyung Kim, Donghyun Kim,
Abstract summary: This paper introduces an innovative approach that leverages large multimodal models (LMMs) to interpret complex street crossing scenes. By generating a safety score and scene description in natural language, our method supports safe decision-making for the blind and low-vision individuals.
Score: 8.468153670795443
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Safely navigating street intersections is a complex challenge for blind and low-vision individuals, as it requires a nuanced understanding of the surrounding context - a task heavily reliant on visual cues. Traditional methods for assisting in this decision-making process often fall short, lacking the ability to provide a comprehensive scene analysis and safety level. This paper introduces an innovative approach that leverages large multimodal models (LMMs) to interpret complex street crossing scenes, offering a potential advancement over conventional traffic signal recognition techniques. By generating a safety score and scene description in natural language, our method supports safe decision-making for the blind and low-vision individuals. We collected crosswalk intersection data that contains multiview egocentric images captured by a quadruped robot and annotated the images with corresponding safety scores based on our predefined safety score categorization. Grounded on the visual knowledge, extracted from images, and text prompt, we evaluate a large multimodal model for safety score prediction and scene description. Our findings highlight the reasoning and safety score prediction capabilities of a LMM, activated by various prompts, as a pathway to developing a trustworthy system, crucial for applications requiring reliable decision-making support.

Related papers

Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends [5.233512464561313]
Traffic safety remains a critical global challenge, with traditional Advanced Driver-Assistance Systems often struggling in dynamic real-world scenarios. This paper reviews the transformative potential of Multimodal Large Language Models (MLLMs) in addressing these limitations. By positioning MLLMs as a cornerstone for next-generation traffic safety systems, this review underscores their potential to revolutionize the field.
arXiv Detail & Related papers (2025-04-21T18:48:35Z)
Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs [56.440345471966666]
Multimodal Large Language Models (MLLMs) have expanded the capabilities of traditional language models by enabling interaction through both text and images. This paper introduces MMSafeAware, the first comprehensive multimodal safety awareness benchmark designed to evaluate MLLMs across 29 safety scenarios. MMSafeAware includes both unsafe and over-safety subsets to assess models abilities to correctly identify unsafe content and avoid over-sensitivity that can hinder helpfulness.
arXiv Detail & Related papers (2025-02-16T16:12:40Z)
MLLM-as-a-Judge for Image Safety without Human Labeling [81.24707039432292]
In the age of AI-generated content (AIGC), many image generation models are capable of producing harmful content. It is crucial to identify such unsafe images based on established safety rules. Existing approaches typically fine-tune MLLMs with human-labeled datasets.
arXiv Detail & Related papers (2024-12-31T00:06:04Z)
Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety. For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context. We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z)
FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant [59.2438504610849]
We introduce FFAA: Face Forgery Analysis Assistant, consisting of a fine-tuned Multimodal Large Language Model (MLLM) and Multi-answer Intelligent Decision System (MIDS) Our method not only provides user-friendly and explainable results but also significantly boosts accuracy and robustness compared to previous methods.
arXiv Detail & Related papers (2024-08-19T15:15:20Z)
Revolutionizing Urban Safety Perception Assessments: Integrating Multimodal Large Language Models with Street View Images [5.799322786332704]
Measuring urban safety perception is an important and complex task that traditionally relies heavily on human resources. Recent advances in multimodal large language models (MLLMs) have demonstrated powerful reasoning and analytical capabilities. We propose a method based on the pre-trained Contrastive Language-Image Pre-training (CLIP) feature and K-Nearest Neighbors (K-NN) retrieval to quickly assess the safety index of the entire city.
arXiv Detail & Related papers (2024-07-29T06:03:13Z)
Cross-Modality Safety Alignment [73.8765529028288]
We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment. To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations. Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios.
arXiv Detail & Related papers (2024-06-21T16:14:15Z)
Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL) We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z)
A Counterfactual Safety Margin Perspective on the Scoring of Autonomous Vehicles' Riskiness [52.27309191283943]
This paper presents a data-driven framework for assessing the risk of different AVs' behaviors. We propose the notion of counterfactual safety margin, which represents the minimum deviation from nominal behavior that could cause a collision.
arXiv Detail & Related papers (2023-08-02T09:48:08Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Analyzing vehicle pedestrian interactions combining data cube structure and predictive collision risk estimation model [5.73658856166614]
This study introduces a new concept of a pedestrian safety system that combines the field and the centralized processes. The system can warn of upcoming risks immediately in the field and improve the safety of risk frequent areas by assessing the safety levels of roads without actual collisions.
arXiv Detail & Related papers (2021-07-26T23:00:56Z)
Vision based Pedestrian Potential Risk Analysis based on Automated Behavior Feature Extraction for Smart and Safe City [5.759189800028578]
We propose a comprehensive analytical model for pedestrian potential risk using video footage gathered by road security cameras deployed at such crossings. The proposed system automatically detects vehicles and pedestrians, calculates trajectories by frames, and extracts behavioral features affecting the likelihood of potentially dangerous scenes between these objects. We validated feasibility and applicability by applying it in multiple crosswalks in Osan city, Korea.
arXiv Detail & Related papers (2021-05-06T11:03:10Z)
Model Guided Road Intersection Classification [2.9248680865344348]
This work investigates inter-section classification from RGB images using well-consolidate neural network approaches along with a method to enhance the results based on the teacher/student training paradigm. An extensive experimental activity aimed at identifying the best input configuration and evaluating different network parameters on both the well-known KITTI dataset and the new KITTI-360 sequences shows that our method outperforms current state-of-the-art approaches on a per-frame basis and prove the effectiveness of the proposed learning scheme.
arXiv Detail & Related papers (2021-04-26T09:15:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.