MuseCL: Predicting Urban Socioeconomic Indicators via Multi-Semantic Contrastive Learning
- URL: http://arxiv.org/abs/2407.09523v1
- Date: Sun, 23 Jun 2024 09:49:41 GMT
- Title: MuseCL: Predicting Urban Socioeconomic Indicators via Multi-Semantic Contrastive Learning
- Authors: Xixian Yong, Xiao Zhou,
- Abstract summary: MuseCL is a framework for fine-grained urban region profiling and socioeconomic prediction.
We construct contrastive sample pairs for street view and remote sensing images, capitalizing on similarities in human mobility.
We extract semantic insights from POI texts embedded within these regions, employing a pre-trained text encoder.
- Score: 13.681538916025021
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting socioeconomic indicators within urban regions is crucial for fostering inclusivity, resilience, and sustainability in cities and human settlements. While pioneering studies have attempted to leverage multi-modal data for socioeconomic prediction, jointly exploring their underlying semantics remains a significant challenge. To address the gap, this paper introduces a Multi-Semantic Contrastive Learning (MuseCL) framework for fine-grained urban region profiling and socioeconomic prediction. Within this framework, we initiate the process by constructing contrastive sample pairs for street view and remote sensing images, capitalizing on the similarities in human mobility and Point of Interest (POI) distribution to derive semantic features from the visual modality. Additionally, we extract semantic insights from POI texts embedded within these regions, employing a pre-trained text encoder. To merge the acquired visual and textual features, we devise an innovative cross-modality-based attentional fusion module, which leverages a contrastive mechanism for integration. Experimental results across multiple cities and indicators consistently highlight the superiority of MuseCL, demonstrating an average improvement of 10% in $R^2$ compared to various competitive baseline models. The code of this work is publicly available at https://github.com/XixianYong/MuseCL.
Related papers
- SCALE: Towards Collaborative Content Analysis in Social Science with Large Language Model Agents and Human Intervention [50.07342730395946]
We introduce a novel multi-agent framework that effectively.
imulates $underlinetextbfC$ontent $underlinetextbfA$nalysis via.
underlinetextbfL$arge language model (LLM) agunderlinetextbfE$nts.
It imitates key phases of content analysis, including text coding, collaborative discussion, and dynamic codebook evolution.
arXiv Detail & Related papers (2025-02-16T00:19:07Z) - MobiCLR: Mobility Time Series Contrastive Learning for Urban Region Representations [18.07010464073212]
We propose a novel urban region representation learning model, which captures semantically meaningful embeddings from inflow and outflow mobility patterns.
We conduct experiments in Chicago, New York, and Washington, D.C. to predict income, educational attainment, and social vulnerability.
arXiv Detail & Related papers (2025-02-05T06:18:43Z) - Multimodal Contrastive Learning of Urban Space Representations from POI Data [2.695321027513952]
CaLLiPer (Contrastive Language-Location Pre-training) is a representation learning model that embeds continuous urban spaces into vector representations.
We validate CaLLiPer's effectiveness by applying it to learning urban space representations in London, UK.
arXiv Detail & Related papers (2024-11-09T16:24:07Z) - Federated Contrastive Learning for Personalized Semantic Communication [55.46383524190467]
We design a federated contrastive learning framework aimed at supporting personalized semantic communication.
FedCL enables collaborative training of local semantic encoders across multiple clients and a global semantic decoder owned by the base station.
To tackle the semantic imbalance issue arising from heterogeneous datasets across distributed clients, we employ contrastive learning to train a semantic centroid generator.
arXiv Detail & Related papers (2024-06-13T14:45:35Z) - Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models [58.58594658683919]
Large multimodal models (LMMs) have shown transformative potential across various research tasks.
Our findings indicate LMMs possess advantages in zero-shot learning, interpretability, and handling uncurated 'in-the-wild' inputs.
We propose a Chain-of-Thought augmented prompting approach, which effectively mitigates the off-target prediction issue.
arXiv Detail & Related papers (2024-05-24T16:26:56Z) - UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Socioeconomic Indicator Prediction [26.693692853787756]
Urban socioeconomic indicator prediction aims to infer various metrics related to sustainable development in diverse urban landscapes.
Pretrained models, particularly those reliant on satellite imagery, face dual challenges.
arXiv Detail & Related papers (2024-03-25T14:57:18Z) - Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for
Cross-City Semantic Segmentation using High-Resolution Domain Adaptation
Networks [82.82866901799565]
We build a new set of multimodal remote sensing benchmark datasets (including hyperspectral, multispectral, SAR) for the study purpose of the cross-city semantic segmentation task.
Beyond the single city, we propose a high-resolution domain adaptation network, HighDAN, to promote the AI model's generalization ability from the multi-city environments.
HighDAN is capable of retaining the spatially topological structure of the studied urban scene well in a parallel high-to-low resolution fusion fashion.
arXiv Detail & Related papers (2023-09-26T23:55:39Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Knowledge-infused Contrastive Learning for Urban Imagery-based
Socioeconomic Prediction [13.26632316765164]
Urban imagery in web like satellite/street view images has emerged as an important source for socioeconomic prediction.
We propose a Knowledge-infused Contrastive Learning model for urban imagery-based socioeconomic prediction.
Our proposed KnowCL model can apply to both satellite and street imagery with both effectiveness and transferability achieved.
arXiv Detail & Related papers (2023-02-25T14:53:17Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Learning Neighborhood Representation from Multi-Modal Multi-Graph:
Image, Text, Mobility Graph and Beyond [20.014906526266795]
We propose a novel approach to integrate multi-modal geotagged inputs as either node or edge features of a multi-graph.
Specifically, we use street view images and POI features to characterize neighborhoods (nodes) and use human mobility to characterize the relationship between neighborhoods (directed edges)
The embedding we trained outperforms the ones using only unimodal data as regional inputs.
arXiv Detail & Related papers (2021-05-06T07:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.