Urban Safety Perception Through the Lens of Large Multimodal Models: A Persona-based Approach
- URL: http://arxiv.org/abs/2503.00610v1
- Date: Sat, 01 Mar 2025 20:34:30 GMT
- Title: Urban Safety Perception Through the Lens of Large Multimodal Models: A Persona-based Approach
- Authors: Ciro Beneduce, Bruno Lepri, Massimiliano Luca,
- Abstract summary: This study introduces Large Multimodal Models (LMMs), specifically Llava 1.6 7B, as a novel approach to assess safety perceptions of urban spaces.<n>The model achieved an average F1-score of 59.21% in classifying urban scenarios as safe or unsafe.<n> incorporating Persona-based prompts revealed significant variations in safety perceptions across the socio-demographic groups of age, gender, and nationality.
- Score: 4.315451628809687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding how urban environments are perceived in terms of safety is crucial for urban planning and policymaking. Traditional methods like surveys are limited by high cost, required time, and scalability issues. To overcome these challenges, this study introduces Large Multimodal Models (LMMs), specifically Llava 1.6 7B, as a novel approach to assess safety perceptions of urban spaces using street-view images. In addition, the research investigated how this task is affected by different socio-demographic perspectives, simulated by the model through Persona-based prompts. Without additional fine-tuning, the model achieved an average F1-score of 59.21% in classifying urban scenarios as safe or unsafe, identifying three key drivers of perceived unsafety: isolation, physical decay, and urban infrastructural challenges. Moreover, incorporating Persona-based prompts revealed significant variations in safety perceptions across the socio-demographic groups of age, gender, and nationality. Elder and female Personas consistently perceive higher levels of unsafety than younger or male Personas. Similarly, nationality-specific differences were evident in the proportion of unsafe classifications ranging from 19.71% in Singapore to 40.15% in Botswana. Notably, the model's default configuration aligned most closely with a middle-aged, male Persona. These findings highlight the potential of LMMs as a scalable and cost-effective alternative to traditional methods for urban safety perceptions. While the sensitivity of these models to socio-demographic factors underscores the need for thoughtful deployment, their ability to provide nuanced perspectives makes them a promising tool for AI-driven urban planning.
Related papers
- Negotiative Alignment: Embracing Disagreement to Achieve Fairer Outcomes -- Insights from Urban Studies [3.510270856154939]
We present findings from a community-centered study in Montreal involving 35 residents with diverse demographic and social identities.
We propose negotiative alignment, an AI framework that treats disagreement as an essential input to be preserved, analyzed, and addressed.
arXiv Detail & Related papers (2025-03-16T18:55:54Z) - Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs [56.440345471966666]
Multimodal Large Language Models (MLLMs) have expanded the capabilities of traditional language models by enabling interaction through both text and images.<n>This paper introduces MMSafeAware, the first comprehensive multimodal safety awareness benchmark designed to evaluate MLLMs across 29 safety scenarios.<n> MMSafeAware includes both unsafe and over-safety subsets to assess models abilities to correctly identify unsafe content and avoid over-sensitivity that can hinder helpfulness.
arXiv Detail & Related papers (2025-02-16T16:12:40Z) - New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook [54.24701201956833]
Security and privacy issues have undermined users' confidence in pre-trained models.
Current literature lacks a clear taxonomy of emerging attacks and defenses for pre-trained models.
This taxonomy categorizes attacks and defenses into No-Change, Input-Change, and Model-Change approaches.
arXiv Detail & Related papers (2024-11-12T10:15:33Z) - UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios [60.492736455572015]
We present UrBench, a benchmark designed for evaluating LMMs in complex multi-view urban scenarios.
UrBench contains 11.6K meticulously curated questions at both region-level and role-level.
Our evaluations on 21 LMMs show that current LMMs struggle in the urban environments in several aspects.
arXiv Detail & Related papers (2024-08-30T13:13:35Z) - FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant [59.2438504610849]
We introduce FFAA: Face Forgery Analysis Assistant, consisting of a fine-tuned Multimodal Large Language Model (MLLM) and Multi-answer Intelligent Decision System (MIDS)
Our method not only provides user-friendly and explainable results but also significantly boosts accuracy and robustness compared to previous methods.
arXiv Detail & Related papers (2024-08-19T15:15:20Z) - Revolutionizing Urban Safety Perception Assessments: Integrating Multimodal Large Language Models with Street View Images [5.799322786332704]
Measuring urban safety perception is an important and complex task that traditionally relies heavily on human resources.
Recent advances in multimodal large language models (MLLMs) have demonstrated powerful reasoning and analytical capabilities.
We propose a method based on the pre-trained Contrastive Language-Image Pre-training (CLIP) feature and K-Nearest Neighbors (K-NN) retrieval to quickly assess the safety index of the entire city.
arXiv Detail & Related papers (2024-07-29T06:03:13Z) - Is it safe to cross? Interpretable Risk Assessment with GPT-4V for Safety-Aware Street Crossing [8.468153670795443]
This paper introduces an innovative approach that leverages large multimodal models (LMMs) to interpret complex street crossing scenes.
By generating a safety score and scene description in natural language, our method supports safe decision-making for the blind and low-vision individuals.
arXiv Detail & Related papers (2024-02-09T21:37:13Z) - Exploring Public's Perception of Safety and Video Surveillance
Technology: A Survey Approach [2.473948454680334]
This study presents a comprehensive analysis of the community's general public safety concerns, their view of existing surveillance technologies, and their perception of AI-driven solutions for enhancing safety in urban environments, focusing on Charlotte, NC.
This research investigates demographic factors such as age, gender, ethnicity, and educational level to gain insights into public perception and concerns toward public safety and possible solutions.
arXiv Detail & Related papers (2023-12-10T15:53:37Z) - A Counterfactual Safety Margin Perspective on the Scoring of Autonomous
Vehicles' Riskiness [52.27309191283943]
This paper presents a data-driven framework for assessing the risk of different AVs' behaviors.
We propose the notion of counterfactual safety margin, which represents the minimum deviation from nominal behavior that could cause a collision.
arXiv Detail & Related papers (2023-08-02T09:48:08Z) - Explainable, automated urban interventions to improve pedestrian and
vehicle safety [0.8620335948752805]
This paper combines public data sources, large-scale street imagery and computer vision techniques to approach pedestrian and vehicle safety.
The steps involved in this pipeline include the adaptation and training of a Residual Convolutional Neural Network to determine a hazard index for each given urban scene.
The outcome of this computational approach is a fine-grained map of hazard levels across a city, and an identify interventions that might simultaneously improve pedestrian and vehicle safety.
arXiv Detail & Related papers (2021-10-22T09:17:39Z) - Methodological Foundation of a Numerical Taxonomy of Urban Form [62.997667081978825]
We present a method for numerical taxonomy of urban form derived from biological systematics.
We derive homogeneous urban tissue types and, by determining overall morphological similarity between them, generate a hierarchical classification of urban form.
After framing and presenting the method, we test it on two cities - Prague and Amsterdam.
arXiv Detail & Related papers (2021-04-30T12:47:52Z) - Predicting Livelihood Indicators from Community-Generated Street-Level
Imagery [70.5081240396352]
We propose an inexpensive, scalable, and interpretable approach to predict key livelihood indicators from public crowd-sourced street-level imagery.
By comparing our results against ground data collected in nationally-representative household surveys, we demonstrate the performance of our approach in accurately predicting indicators of poverty, population, and health.
arXiv Detail & Related papers (2020-06-15T18:12:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.