Multimodal Contrastive Learning of Urban Space Representations from POI Data
- URL: http://arxiv.org/abs/2411.06229v1
- Date: Sat, 09 Nov 2024 16:24:07 GMT
- Title: Multimodal Contrastive Learning of Urban Space Representations from POI Data
- Authors: Xinglei Wang, Tao Cheng, Stephen Law, Zichao Zeng, Lu Yin, Junyuan Liu,
- Abstract summary: CaLLiPer (Contrastive Language-Location Pre-training) is a representation learning model that embeds continuous urban spaces into vector representations.
We validate CaLLiPer's effectiveness by applying it to learning urban space representations in London, UK.
- Score: 2.695321027513952
- License:
- Abstract: Existing methods for learning urban space representations from Point-of-Interest (POI) data face several limitations, including issues with geographical delineation, inadequate spatial information modelling, underutilisation of POI semantic attributes, and computational inefficiencies. To address these issues, we propose CaLLiPer (Contrastive Language-Location Pre-training), a novel representation learning model that directly embeds continuous urban spaces into vector representations that can capture the spatial and semantic distribution of urban environment. This model leverages a multimodal contrastive learning objective, aligning location embeddings with textual POI descriptions, thereby bypassing the need for complex training corpus construction and negative sampling. We validate CaLLiPer's effectiveness by applying it to learning urban space representations in London, UK, where it demonstrates 5-15% improvement in predictive performance for land use classification and socioeconomic mapping tasks compared to state-of-the-art methods. Visualisations of the learned representations further illustrate our model's advantages in capturing spatial variations in urban semantics with high accuracy and fine resolution. Additionally, CaLLiPer achieves reduced training time, showcasing its efficiency and scalability. This work provides a promising pathway for scalable, semantically rich urban space representation learning that can support the development of geospatial foundation models. The implementation code is available at https://github.com/xlwang233/CaLLiPer.
Related papers
- Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction [1.5156879440024378]
Commuting flow prediction is an essential task for municipal operations in the real world.
We develop a heterogeneous graph-based model to generate meaningful region embeddings for predicting different types of inter-level OD flows.
Our proposed model outperforms existing models in terms of a uniform urban structure.
arXiv Detail & Related papers (2024-08-27T03:30:01Z) - Fine-Grained Urban Flow Inference with Multi-scale Representation Learning [14.673004628911443]
We propose an effective fine-grained urban flow inference model called UrbanMSR.
It uses self-supervised contrastive learning to obtain dynamic multi-scale representations of neighborhood-level and city-level geographic information.
We validate the performance through extensive experiments on three real-world datasets.
arXiv Detail & Related papers (2024-06-14T04:42:29Z) - Unified Data Management and Comprehensive Performance Evaluation for
Urban Spatial-Temporal Prediction [Experiment, Analysis & Benchmark] [78.05103666987655]
This work addresses challenges in accessing and utilizing diverse urban spatial-temporal datasets.
We introduceatomic files, a unified storage format designed for urban spatial-temporal big data, and validate its effectiveness on 40 diverse datasets.
We conduct extensive experiments using diverse models and datasets, establishing a performance leaderboard and identifying promising research directions.
arXiv Detail & Related papers (2023-08-24T16:20:00Z) - Adaptive Hierarchical SpatioTemporal Network for Traffic Forecasting [70.66710698485745]
We propose an Adaptive Hierarchical SpatioTemporal Network (AHSTN) to promote traffic forecasting.
AHSTN exploits the spatial hierarchy and modeling multi-scale spatial correlations.
Experiments on two real-world datasets show that AHSTN achieves better performance over several strong baselines.
arXiv Detail & Related papers (2023-06-15T14:50:27Z) - Spatio-Temporal Graph Few-Shot Learning with Cross-City Knowledge
Transfer [58.6106391721944]
Cross-city knowledge has shown its promise, where the model learned from data-sufficient cities is leveraged to benefit the learning process of data-scarce cities.
We propose a model-agnostic few-shot learning framework for S-temporal graph called ST-GFSL.
We conduct comprehensive experiments on four traffic speed prediction benchmarks and the results demonstrate the effectiveness of ST-GFSL compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-05-27T12:46:52Z) - Learning Neighborhood Representation from Multi-Modal Multi-Graph:
Image, Text, Mobility Graph and Beyond [20.014906526266795]
We propose a novel approach to integrate multi-modal geotagged inputs as either node or edge features of a multi-graph.
Specifically, we use street view images and POI features to characterize neighborhoods (nodes) and use human mobility to characterize the relationship between neighborhoods (directed edges)
The embedding we trained outperforms the ones using only unimodal data as regional inputs.
arXiv Detail & Related papers (2021-05-06T07:44:05Z) - Methodological Foundation of a Numerical Taxonomy of Urban Form [62.997667081978825]
We present a method for numerical taxonomy of urban form derived from biological systematics.
We derive homogeneous urban tissue types and, by determining overall morphological similarity between them, generate a hierarchical classification of urban form.
After framing and presenting the method, we test it on two cities - Prague and Amsterdam.
arXiv Detail & Related papers (2021-04-30T12:47:52Z) - Learning Large-scale Location Embedding From Human Mobility Trajectories
with Graphs [0.0]
This study learns vector representations for locations using the large-scale LBS data.
This model embeds context information in human mobility and spatial information.
GCN-L2V can be applied in a complementary manner to other place embedding methods and down-streaming Geo-aware applications.
arXiv Detail & Related papers (2021-02-23T09:11:33Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - Learning Geo-Contextual Embeddings for Commuting Flow Prediction [20.600183945696863]
Predicting commuting flows based on infrastructure and land-use information is critical for urban planning and public policy development.
Conventional models, such as gravity model, are mainly derived from physics principles and limited by their predictive power in real-world scenarios.
We propose Geo-contextual Multitask Embedding Learner (GMEL), a model that captures the spatial correlations from geographic contextual information for commuting flow prediction.
arXiv Detail & Related papers (2020-05-04T17:45:18Z) - Spatial Pyramid Based Graph Reasoning for Semantic Segmentation [67.47159595239798]
We apply graph convolution into the semantic segmentation task and propose an improved Laplacian.
The graph reasoning is directly performed in the original feature space organized as a spatial pyramid.
We achieve comparable performance with advantages in computational and memory overhead.
arXiv Detail & Related papers (2020-03-23T12:28:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.