UrbanFM: Scaling Urban Spatio-Temporal Foundation Models
- URL: http://arxiv.org/abs/2602.20677v2
- Date: Mon, 02 Mar 2026 08:34:52 GMT
- Title: UrbanFM: Scaling Urban Spatio-Temporal Foundation Models
- Authors: Wei Chen, Yuqian Wu, Junle Chen, Xiaofang Zhou, Yuxuan Liang,
- Abstract summary: Urban systems as dynamic systems generate dynamic-temporal data streams that encode the fundamental laws of human mobility and city evolution.<n>While AI for Science has witnessed the transformative power of foundation models in disciplines like meteorology, urban computing remains fragmented due to "scenario-specific" models.<n>We propose UrbanFM, a minimalist self-attention architecture designed with limited inductive biases to unify architecture from massive data.<n>Experiments demonstrate that UrbanFM achieves remarkable zero-shot generalization across cities and tasks, a first step toward large-scale urban-temporal foundation models.
- Score: 36.98769959300113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Urban systems, as dynamic complex systems, continuously generate spatio-temporal data streams that encode the fundamental laws of human mobility and city evolution. While AI for Science has witnessed the transformative power of foundation models in disciplines like genomics and meteorology, urban computing remains fragmented due to "scenario-specific" models, which are overfitted to specific regions or tasks, hindering their generalizability. To bridge this gap and advance spatio-temporal foundation models for urban systems, we adopt scaling as the central perspective and systematically investigate two key questions: what to scale and how to scale. Grounded in first-principles analysis, we identify three critical dimensions: heterogeneity, correlation, and dynamics, aligning these principles with the fundamental scientific properties of urban spatio-temporal data. Specifically, to address heterogeneity through data scaling, we construct WorldST. This billion-scale corpus standardizes diverse physical signals, such as traffic flow and speed, from over 100 global cities into a unified data format. To enable computation scaling for modeling correlations, we introduce the MiniST unit, a novel split mechanism that discretizes continuous spatio-temporal fields into learnable computational units to unify representations of grid-based and sensor-based observations. Finally, addressing dynamics via architecture scaling, we propose UrbanFM, a minimalist self-attention architecture designed with limited inductive biases to autonomously learn dynamic spatio-temporal dependencies from massive data. Furthermore, we establish EvalST, the largest-scale urban spatio-temporal benchmark to date. Extensive experiments demonstrate that UrbanFM achieves remarkable zero-shot generalization across unseen cities and tasks, marking a pivotal first step toward large-scale urban spatio-temporal foundation models.
Related papers
- Predicting Large-scale Urban Network Dynamics with Energy-informed Graph Neural Diffusion [51.198001060683296]
Networked urban systems facilitate the flow of people, resources, and services.<n>Current models such as graph neural networks have shown promise but face a trade-off between efficacy and efficiency.<n>This paper addresses this trade-off by drawing inspiration from physical laws to inform essential model designs.
arXiv Detail & Related papers (2025-07-31T01:24:01Z) - UrbanMind: Urban Dynamics Prediction with Multifaceted Spatial-Temporal Large Language Models [18.051209616917042]
UrbanMind is a novel spatial-temporal LLM framework for multifaceted urban dynamics prediction.<n>At its core, UrbanMind introduces Muffin-MAE, a multifaceted fusion masked autoencoder with specialized masking strategies.<n>Experiments on real-world urban datasets across multiple cities demonstrate that UrbanMind consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-05-16T19:38:06Z) - Real-time Spatial Retrieval Augmented Generation for Urban Environments [2.8367942280334493]
This work proposes a real-time spatial RAG architecture that defines the necessary components for the effective integration of generative AI into cities.<n>The proposed architecture is implemented using FIWARE, an ecosystem of software components to develop smart city solutions and digital twins.
arXiv Detail & Related papers (2025-05-04T21:57:58Z) - Collaborative Imputation of Urban Time Series through Cross-city Meta-learning [54.438991949772145]
We propose a novel collaborative imputation paradigm leveraging meta-learned implicit neural representations (INRs)<n>We then introduce a cross-city collaborative learning scheme through model-agnostic meta learning.<n>Experiments on a diverse urban dataset from 20 global cities demonstrate our model's superior imputation performance and generalizability.
arXiv Detail & Related papers (2025-01-20T07:12:40Z) - Diffusion Transformers as Open-World Spatiotemporal Foundation Models [30.98708067420915]
UrbanDiT is a foundation model for open-world urban-temporal learning.<n>Its key innovation lies in the elaborated prompt learning framework, which adaptively generates both data-driven and task-specific prompts.<n>UrbanDiT sets up a new benchmark benchmark for foundation models in the urban-temporal domain.
arXiv Detail & Related papers (2024-11-19T02:01:07Z) - HGAurban: Heterogeneous Graph Autoencoding for Urban Spatial-Temporal Learning [36.80668790442231]
A key challenge lies in the noisy and sparse nature of spatial-temporal data, which limits existing neural networks' ability to learn meaningful region representations in the spatial-temporal graph.<n>We propose Hurban, a novel heterogeneous spatial-temporal graph masked autoencoder that leverages generative self-supervised learning for robust urban data representation.
arXiv Detail & Related papers (2024-10-14T07:33:33Z) - UrbanGPT: Spatio-Temporal Large Language Models [34.79169613947957]
We present the UrbanPT, which seamlessly integrates atemporal-temporal encoder with instruction-tuning paradigm.
We conduct extensive experiments on various public datasets, covering differenttemporal prediction tasks.
The results consistently demonstrate that our UrbanPT, with its carefully designed architecture, consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-25T12:37:29Z) - Rethinking Urban Mobility Prediction: A Super-Multivariate Time Series
Forecasting Approach [71.67506068703314]
Long-term urban mobility predictions play a crucial role in the effective management of urban facilities and services.
Traditionally, urban mobility data has been structured as videos, treating longitude and latitude as fundamental pixels.
In our research, we introduce a fresh perspective on urban mobility prediction.
Instead of oversimplifying urban mobility data as traditional video data, we regard it as a complex time series.
arXiv Detail & Related papers (2023-12-04T07:39:05Z) - Unified Data Management and Comprehensive Performance Evaluation for
Urban Spatial-Temporal Prediction [Experiment, Analysis & Benchmark] [78.05103666987655]
This work addresses challenges in accessing and utilizing diverse urban spatial-temporal datasets.
We introduceatomic files, a unified storage format designed for urban spatial-temporal big data, and validate its effectiveness on 40 diverse datasets.
We conduct extensive experiments using diverse models and datasets, establishing a performance leaderboard and identifying promising research directions.
arXiv Detail & Related papers (2023-08-24T16:20:00Z) - Methodological Foundation of a Numerical Taxonomy of Urban Form [62.997667081978825]
We present a method for numerical taxonomy of urban form derived from biological systematics.
We derive homogeneous urban tissue types and, by determining overall morphological similarity between them, generate a hierarchical classification of urban form.
After framing and presenting the method, we test it on two cities - Prague and Amsterdam.
arXiv Detail & Related papers (2021-04-30T12:47:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.