UrbanDiT: A Foundation Model for Open-World Urban Spatio-Temporal Learning
- URL: http://arxiv.org/abs/2411.12164v1
- Date: Tue, 19 Nov 2024 02:01:07 GMT
- Title: UrbanDiT: A Foundation Model for Open-World Urban Spatio-Temporal Learning
- Authors: Yuan Yuan, Chonghua Han, Jingtao Ding, Depeng Jin, Yong Li,
- Abstract summary: UrbanDiTT is a foundation model open-world urban-temporal learning.
It integrates diverse-temporal data sources and types across different cities and scenarios.
- Score: 25.217842149162735
- License:
- Abstract: The urban environment is characterized by complex spatio-temporal dynamics arising from diverse human activities and interactions. Effectively modeling these dynamics is essential for understanding and optimizing urban systems In this work, we introduce UrbanDiT, a foundation model for open-world urban spatio-temporal learning that successfully scale up diffusion transformers in this field. UrbanDiT pioneers a unified model that integrates diverse spatio-temporal data sources and types while learning universal spatio-temporal patterns across different cities and scenarios. This allows the model to unify both multi-data and multi-task learning, and effectively support a wide range of spatio-temporal applications. Its key innovation lies in the elaborated prompt learning framework, which adaptively generates both data-driven and task-specific prompts, guiding the model to deliver superior performance across various urban applications. UrbanDiT offers three primary advantages: 1) It unifies diverse data types, such as grid-based and graph-based data, into a sequential format, allowing to capture spatio-temporal dynamics across diverse scenarios of different cities; 2) With masking strategies and task-specific prompts, it supports a wide range of tasks, including bi-directional spatio-temporal prediction, temporal interpolation, spatial extrapolation, and spatio-temporal imputation; and 3) It generalizes effectively to open-world scenarios, with its powerful zero-shot capabilities outperforming nearly all baselines with training data. These features allow UrbanDiT to achieves state-of-the-art performance in different domains such as transportation traffic, crowd flows, taxi demand, bike usage, and cellular traffic, across multiple cities and tasks. UrbanDiT sets up a new benchmark for foundation models in the urban spatio-temporal domain.
Related papers
- Get Rid of Task Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework [10.33844348594636]
We argue that there is an essential to propose a Continuous Multi-task Spatiotemporal learning framework (CMuST) to empower collective urban intelligence.
CMuST reforms the urbantemporal learning from singledomain to cooperatively multi-task learning.
We establish a benchmark of three cities for multi-tasktemporal learning, and empirically demonstrate the superiority of CMuST.
arXiv Detail & Related papers (2024-10-14T14:04:36Z) - A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language.
To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates.
We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z) - OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction [29.514461050436932]
We introduce a novel foundation model, named OpenCity, that can effectively capture and normalize the underlying unseen-temporal patterns from diverse data characteristics.
OpenCity integrates the Transformer architecture with graph neural networks to model the complex-temporal dependencies in traffic data.
Experimental results demonstrate that OpenCity exhibits exceptional zero-shot performance.
arXiv Detail & Related papers (2024-08-16T15:20:36Z) - SMA-Hyper: Spatiotemporal Multi-View Fusion Hypergraph Learning for Traffic Accident Prediction [2.807532512532818]
Current data-driven models often struggle with data sparsity and the integration of diverse urban data sources.
We introduce a deep dynamic learning framework designed for traffic accident prediction.
It incorporates dual adaptive graph learning mechanisms that enable high-order cross-regional learning.
It also employs an advance attention mechanism to fuse multiple views of accident data and urban functional features.
arXiv Detail & Related papers (2024-07-24T21:10:34Z) - UrbanGPT: Spatio-Temporal Large Language Models [34.79169613947957]
We present the UrbanPT, which seamlessly integrates atemporal-temporal encoder with instruction-tuning paradigm.
We conduct extensive experiments on various public datasets, covering differenttemporal prediction tasks.
The results consistently demonstrate that our UrbanPT, with its carefully designed architecture, consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-25T12:37:29Z) - Spatio-Temporal Few-Shot Learning via Diffusive Neural Network Generation [25.916891462152044]
We propose a novel generative pre-training framework, GPD, for intricate few-shot learning with urban knowledge transfer.
We recast a generative diffusion model, which generates tailored neural networks guided by prompts.
GPD consistently outperforms state-of-the-art baselines on datasets for tasks such as traffic speed prediction and crowd flow prediction.
arXiv Detail & Related papers (2024-02-19T08:11:26Z) - Rethinking Urban Mobility Prediction: A Super-Multivariate Time Series
Forecasting Approach [71.67506068703314]
Long-term urban mobility predictions play a crucial role in the effective management of urban facilities and services.
Traditionally, urban mobility data has been structured as videos, treating longitude and latitude as fundamental pixels.
In our research, we introduce a fresh perspective on urban mobility prediction.
Instead of oversimplifying urban mobility data as traditional video data, we regard it as a complex time series.
arXiv Detail & Related papers (2023-12-04T07:39:05Z) - Unified Data Management and Comprehensive Performance Evaluation for
Urban Spatial-Temporal Prediction [Experiment, Analysis & Benchmark] [78.05103666987655]
This work addresses challenges in accessing and utilizing diverse urban spatial-temporal datasets.
We introduceatomic files, a unified storage format designed for urban spatial-temporal big data, and validate its effectiveness on 40 diverse datasets.
We conduct extensive experiments using diverse models and datasets, establishing a performance leaderboard and identifying promising research directions.
arXiv Detail & Related papers (2023-08-24T16:20:00Z) - Multi-Temporal Relationship Inference in Urban Areas [75.86026742632528]
Finding temporal relationships among locations can benefit a bunch of urban applications, such as dynamic offline advertising and smart public transport planning.
We propose a solution to Trial with a graph learning scheme, which includes a spatially evolving graph neural network (SEENet)
SEConv performs the intra-time aggregation and inter-time propagation to capture the multifaceted spatially evolving contexts from the view of location message passing.
SE-SSL designs time-aware self-supervised learning tasks in a global-local manner with additional evolving constraint to enhance the location representation learning and further handle the relationship sparsity.
arXiv Detail & Related papers (2023-06-15T07:48:32Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Averaging Spatio-temporal Signals using Optimal Transport and Soft
Alignments [110.79706180350507]
We show that our proposed loss can be used to define temporal-temporal baryechecenters as Fr'teche means duality.
Experiments on handwritten letters and brain imaging data confirm our theoretical findings.
arXiv Detail & Related papers (2022-03-11T09:46:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.