Related papers: From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models

From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models

URL: http://arxiv.org/abs/2506.02242v2
Date: Tue, 17 Jun 2025 19:05:02 GMT
Title: From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models
Authors: Yihong Tang, Ao Qu, Xujing Yu, Weipeng Deng, Jun Ma, Jinhua Zhao, Lijun Sun,
Abstract summary: Urban and transportation research has long sought to uncover statistically meaningful relationships between key variables and societal outcomes such as road safety.<n>We propose a Multimodal Large Language Model (MLLM)-based approach for interpretable hypothesis inference.
Score: 18.69630838520861
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Urban and transportation research has long sought to uncover statistically meaningful relationships between key variables and societal outcomes such as road safety, to generate actionable insights that guide the planning, development, and renewal of urban and transportation systems. However, traditional workflows face several key challenges: (1) reliance on human experts to propose hypotheses, which is time-consuming and prone to confirmation bias; (2) limited interpretability, particularly in deep learning approaches; and (3) underutilization of unstructured data that can encode critical urban context. Given these limitations, we propose a Multimodal Large Language Model (MLLM)-based approach for interpretable hypothesis inference, enabling the automated generation, evaluation, and refinement of hypotheses concerning urban context and road safety outcomes. Our method leverages MLLMs to craft safety-relevant questions for street view images (SVIs), extract interpretable embeddings from their responses, and apply them in regression-based statistical models. UrbanX supports iterative hypothesis testing and refinement, guided by statistical evidence such as coefficient significance, thereby enabling rigorous scientific discovery of previously overlooked correlations between urban design and safety. Experimental evaluations on Manhattan street segments demonstrate that our approach outperforms pretrained deep learning models while offering full interpretability. Beyond road safety, UrbanX can serve as a general-purpose framework for urban scientific discovery, extracting structured insights from unstructured urban data across diverse socioeconomic and environmental outcomes. This approach enhances model trustworthiness for policy applications and establishes a scalable, statistically grounded pathway for interpretable knowledge discovery in urban and transportation studies.

Related papers

Reimagining Urban Science: Scaling Causal Inference with Large Language Models [39.231736674554995]
This Perspective examines current urban causal research by analyzing that categorize research topics, data sources, and methodological approaches to identify structural gaps.<n>We introduce an AutoUrbanCI conceptual framework, composed of four distinct modular agents responsible for hypothesis generation, data engineering, experiment design and execution, and results interpretation with policy recommendations.<n>We propose evaluation criteria for rigor and transparency and reflect on implications for human-AI collaboration, equity, and accountability.
arXiv Detail & Related papers (2025-04-15T16:58:11Z)
Urban Safety Perception Through the Lens of Large Multimodal Models: A Persona-based Approach [4.315451628809687]
This study introduces Large Multimodal Models (LMMs), specifically Llava 1.6 7B, as a novel approach to assess safety perceptions of urban spaces.<n>The model achieved an average F1-score of 59.21% in classifying urban scenarios as safe or unsafe.<n> incorporating Persona-based prompts revealed significant variations in safety perceptions across the socio-demographic groups of age, gender, and nationality.
arXiv Detail & Related papers (2025-03-01T20:34:30Z)
Collaborative Imputation of Urban Time Series through Cross-city Meta-learning [54.438991949772145]
We propose a novel collaborative imputation paradigm leveraging meta-learned implicit neural representations (INRs)<n>We then introduce a cross-city collaborative learning scheme through model-agnostic meta learning.<n>Experiments on a diverse urban dataset from 20 global cities demonstrate our model's superior imputation performance and generalizability.
arXiv Detail & Related papers (2025-01-20T07:12:40Z)
Harnessing LLMs for Cross-City OD Flow Prediction [5.6685153523382015]
We introduce a new method for cross-city Origin-Destination (OD) flow prediction using Large Language Models (LLMs) Our approach leverages the advanced semantic understanding and contextual learning capabilities of LLMs to bridge the gap between cities with different characteristics. Our novel framework involves four major components: collecting OD training datasets from a source city, instruction-tuning the LLMs, predicting destination POIs in a target city, and identifying the locations that best match the predicted destination POIs.
arXiv Detail & Related papers (2024-09-05T23:04:28Z)
Urban Safety Perception Assessments via Integrating Multimodal Large Language Models with Street View Images [5.799322786332704]
Measuring urban safety perception is an important and complex task that traditionally relies heavily on human resources.<n>Recent advances in multimodal large language models (MLLMs) have demonstrated powerful reasoning and analytical capabilities.<n>We propose a method based on the pre-trained Contrastive Language-Image Pre-training (CLIP) feature and K-Nearest Neighbors (K-NN) retrieval to quickly assess the safety index of the entire city.
arXiv Detail & Related papers (2024-07-29T06:03:13Z)
CityBench: Evaluating the Capabilities of Large Language Models for Urban Tasks [10.22654338686634]
Large language models (LLMs) and vision-language models (VLMs) have become essential to ensure their real-world effectiveness and reliability.<n>The challenge in constructing a systematic evaluation benchmark for urban research lies in the diversity of urban data.<n>In this paper, we design textitCityBench, an interactive simulator based evaluation platform.
arXiv Detail & Related papers (2024-06-20T02:25:07Z)
Reinforcement Learning with Human Feedback for Realistic Traffic Simulation [53.85002640149283]
Key element of effective simulation is the incorporation of realistic traffic models that align with human knowledge. This study identifies two main challenges: capturing the nuances of human preferences on realism and the unification of diverse traffic simulation models.
arXiv Detail & Related papers (2023-09-01T19:29:53Z)
Uncertainty Quantification for Image-based Traffic Prediction across Cities [63.136794104678025]
Uncertainty quantification (UQ) methods provide an approach to induce probabilistic reasoning. We investigate their application to a large-scale image-based traffic dataset spanning multiple cities. We find that our approach can capture both temporal and spatial effects on traffic behaviour in a representative case study for the city of Moscow.
arXiv Detail & Related papers (2023-08-11T13:35:52Z)
A Study of Situational Reasoning for Traffic Understanding [63.45021731775964]
We devise three novel text-based tasks for situational reasoning in the traffic domain. We adopt four knowledge-enhanced methods that have shown generalization capability across language reasoning tasks in prior work. We provide in-depth analyses of model performance on data partitions and examine model predictions categorically.
arXiv Detail & Related papers (2023-06-05T01:01:12Z)
Methodological Foundation of a Numerical Taxonomy of Urban Form [62.997667081978825]
We present a method for numerical taxonomy of urban form derived from biological systematics. We derive homogeneous urban tissue types and, by determining overall morphological similarity between them, generate a hierarchical classification of urban form. After framing and presenting the method, we test it on two cities - Prague and Amsterdam.
arXiv Detail & Related papers (2021-04-30T12:47:52Z)
Congestion-aware Multi-agent Trajectory Prediction for Collision Avoidance [110.63037190641414]
We propose to learn congestion patterns explicitly and devise a novel "Sense--Learn--Reason--Predict" framework. By decomposing the learning phases into two stages, a "student" can learn contextual cues from a "teacher" while generating collision-free trajectories. In experiments, we demonstrate that the proposed model is able to generate collision-free trajectory predictions in a synthetic dataset.
arXiv Detail & Related papers (2021-03-26T02:42:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.