UrbanSense:A Framework for Quantitative Analysis of Urban Streetscapes leveraging Vision Large Language Models
- URL: http://arxiv.org/abs/2506.10342v2
- Date: Mon, 04 Aug 2025 15:56:54 GMT
- Title: UrbanSense:A Framework for Quantitative Analysis of Urban Streetscapes leveraging Vision Large Language Models
- Authors: Jun Yin, Jing Zhong, Peilin Li, Ruolin Pan, Pengyu Zeng, Miao Zhang, Shuai Lu,
- Abstract summary: Urban cultures and architectural styles vary significantly across cities due to geographical, chronological, historical, and socio-political factors.<n>We propose a multimodal research framework based on vision-language models, enabling automated and scalable analysis of urban streetscape style differences.
- Score: 24.731262578136057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Urban cultures and architectural styles vary significantly across cities due to geographical, chronological, historical, and socio-political factors. Understanding these differences is essential for anticipating how cities may evolve in the future. As representative cases of historical continuity and modern innovation in China, Beijing and Shenzhen offer valuable perspectives for exploring the transformation of urban streetscapes. However, conventional approaches to urban cultural studies often rely on expert interpretation and historical documentation, which are difficult to standardize across different contexts. To address this, we propose a multimodal research framework based on vision-language models, enabling automated and scalable analysis of urban streetscape style differences. This approach enhances the objectivity and data-driven nature of urban form research. The contributions of this study are as follows: First, we construct UrbanDiffBench, a curated dataset of urban streetscapes containing architectural images from different periods and regions. Second, we develop UrbanSense, the first vision-language-model-based framework for urban streetscape analysis, enabling the quantitative generation and comparison of urban style representations. Third, experimental results show that Over 80% of generated descriptions pass the t-test (p less than 0.05). High Phi scores (0.912 for cities, 0.833 for periods) from subjective evaluations confirm the method's ability to capture subtle stylistic differences. These results highlight the method's potential to quantify and interpret urban style evolution, offering a scientifically grounded lens for future design.
Related papers
- StreetviewLLM: Extracting Geographic Information Using a Chain-of-Thought Multimodal Large Language Model [12.789465279993864]
Geospatial predictions are crucial for diverse fields such as disaster management, urban planning, and public health.
We propose StreetViewLLM, a novel framework that integrates a large language model with the chain-of-thought reasoning and multimodal data sources.
The model has been applied to seven global cities, including Hong Kong, Tokyo, Singapore, Los Angeles, New York, London, and Paris.
arXiv Detail & Related papers (2024-11-19T05:15:19Z) - CityPulse: Fine-Grained Assessment of Urban Change with Street View Time
Series [12.621355888239359]
Urban transformations have profound societal impact on both individuals and communities at large.
We propose an end-to-end change detection model to effectively capture physical alterations in the built environment at scale.
Our approach has the potential to supplement existing dataset and serve as a fine-grained and accurate assessment of urban change.
arXiv Detail & Related papers (2024-01-02T08:57:09Z) - Unified Data Management and Comprehensive Performance Evaluation for
Urban Spatial-Temporal Prediction [Experiment, Analysis & Benchmark] [78.05103666987655]
This work addresses challenges in accessing and utilizing diverse urban spatial-temporal datasets.
We introduceatomic files, a unified storage format designed for urban spatial-temporal big data, and validate its effectiveness on 40 diverse datasets.
We conduct extensive experiments using diverse models and datasets, establishing a performance leaderboard and identifying promising research directions.
arXiv Detail & Related papers (2023-08-24T16:20:00Z) - A Contextual Master-Slave Framework on Urban Region Graph for Urban
Village Detection [68.84486900183853]
We build an urban region graph (URG) to model the urban area in a hierarchically structured way.
Then, we design a novel contextual master-slave framework to effectively detect the urban village from the URG.
The proposed framework can learn to balance the generality and specificity for UV detection in an urban area.
arXiv Detail & Related papers (2022-11-26T18:17:39Z) - Urban form and COVID-19 cases and deaths in Greater London: an urban
morphometric approach [63.29165619502806]
The COVID-19 pandemic generated a considerable debate in relation to urban density.
This is an old debate, originated in mid 19th century's England with the emergence of public health and urban planning disciplines.
We describe urban form at individual building level and then aggregate information for official neighbourhoods.
arXiv Detail & Related papers (2022-10-16T10:01:10Z) - GANs for Urban Design [0.0]
The topic investigated in this paper is the application of Generative Adversarial Networks to the design of an urban block.
The research presents a flexible model able to adapt to the morphological characteristics of a city.
arXiv Detail & Related papers (2021-05-04T19:50:24Z) - Methodological Foundation of a Numerical Taxonomy of Urban Form [62.997667081978825]
We present a method for numerical taxonomy of urban form derived from biological systematics.
We derive homogeneous urban tissue types and, by determining overall morphological similarity between them, generate a hierarchical classification of urban form.
After framing and presenting the method, we test it on two cities - Prague and Amsterdam.
arXiv Detail & Related papers (2021-04-30T12:47:52Z) - Modeling Fashion Influence from Photos [108.58097776743331]
We explore fashion influence along two channels: geolocation and fashion brands.
We leverage public large-scale datasets of 7.7M Instagram photos from 44 major world cities.
Our results indicate the advantage of grounding visual style evolution both spatially and temporally.
arXiv Detail & Related papers (2020-11-17T20:24:03Z) - City limits in the age of smartphones and urban scaling [0.0]
Urban planning still lacks appropriate standards to define city boundaries across urban systems.
ICT provide the potential to portray more accurate descriptions of the urban systems.
We apply computational techniques over a large volume of mobile phone records to define urban boundaries.
arXiv Detail & Related papers (2020-05-06T17:31:21Z) - From Paris to Berlin: Discovering Fashion Style Influences Around the
World [108.58097776743331]
We propose to quantify fashion influences from everyday images of people wearing clothes.
We introduce an approach that detects which cities influence which other cities in terms of propagating their styles.
We then leverage the discovered influence patterns to inform a forecasting model that predicts the popularity of any given style at any given city into the future.
arXiv Detail & Related papers (2020-04-03T00:54:23Z) - Indexical Cities: Articulating Personal Models of Urban Preference with
Geotagged Data [0.0]
This research characterizes personal preference in urban spaces and predicts a spectrum of unknown likeable places for a specific observer.
Unlike most urban perception studies, our intention is not by any means to provide an objective measure of urban quality, but rather to portray personal views of the city or Cities of Cities.
arXiv Detail & Related papers (2020-01-23T11:00:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.