MuseCL: Predicting Urban Socioeconomic Indicators via Multi-Semantic Contrastive Learning
- URL: http://arxiv.org/abs/2407.09523v1
- Date: Sun, 23 Jun 2024 09:49:41 GMT
- Title: MuseCL: Predicting Urban Socioeconomic Indicators via Multi-Semantic Contrastive Learning
- Authors: Xixian Yong, Xiao Zhou,
- Abstract summary: MuseCL is a framework for fine-grained urban region profiling and socioeconomic prediction.
We construct contrastive sample pairs for street view and remote sensing images, capitalizing on similarities in human mobility.
We extract semantic insights from POI texts embedded within these regions, employing a pre-trained text encoder.
- Score: 13.681538916025021
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting socioeconomic indicators within urban regions is crucial for fostering inclusivity, resilience, and sustainability in cities and human settlements. While pioneering studies have attempted to leverage multi-modal data for socioeconomic prediction, jointly exploring their underlying semantics remains a significant challenge. To address the gap, this paper introduces a Multi-Semantic Contrastive Learning (MuseCL) framework for fine-grained urban region profiling and socioeconomic prediction. Within this framework, we initiate the process by constructing contrastive sample pairs for street view and remote sensing images, capitalizing on the similarities in human mobility and Point of Interest (POI) distribution to derive semantic features from the visual modality. Additionally, we extract semantic insights from POI texts embedded within these regions, employing a pre-trained text encoder. To merge the acquired visual and textual features, we devise an innovative cross-modality-based attentional fusion module, which leverages a contrastive mechanism for integration. Experimental results across multiple cities and indicators consistently highlight the superiority of MuseCL, demonstrating an average improvement of 10% in $R^2$ compared to various competitive baseline models. The code of this work is publicly available at https://github.com/XixianYong/MuseCL.
Related papers
- Synergizing LLM Agents and Knowledge Graph for Socioeconomic Prediction in LBSN [24.04470448615187]
Location-based social networks (LBSNs) have led to significant changes in society, resulting in popular studies of using LBSN data for socioeconomic prediction.
Existing approaches rely on ideas and expertise to extract task-relevant knowledge from diverse data, which may not be optimal for specific tasks.
Motivated by the remarkable abilities of large language models (LLMs) in commonsense reasoning, embedding, and multi-agent collaboration, in this work, we synergize LLM agents and knowledge graph for socioeconomic prediction.
arXiv Detail & Related papers (2024-10-29T04:03:15Z) - Federated Contrastive Learning for Personalized Semantic Communication [55.46383524190467]
We design a federated contrastive learning framework aimed at supporting personalized semantic communication.
FedCL enables collaborative training of local semantic encoders across multiple clients and a global semantic decoder owned by the base station.
To tackle the semantic imbalance issue arising from heterogeneous datasets across distributed clients, we employ contrastive learning to train a semantic centroid generator.
arXiv Detail & Related papers (2024-06-13T14:45:35Z) - Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models [58.58594658683919]
Large multimodal models (LMMs) have shown transformative potential across various research tasks.
Our findings indicate LMMs possess advantages in zero-shot learning, interpretability, and handling uncurated 'in-the-wild' inputs.
We propose a Chain-of-Thought augmented prompting approach, which effectively mitigates the off-target prediction issue.
arXiv Detail & Related papers (2024-05-24T16:26:56Z) - Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for
Cross-City Semantic Segmentation using High-Resolution Domain Adaptation
Networks [82.82866901799565]
We build a new set of multimodal remote sensing benchmark datasets (including hyperspectral, multispectral, SAR) for the study purpose of the cross-city semantic segmentation task.
Beyond the single city, we propose a high-resolution domain adaptation network, HighDAN, to promote the AI model's generalization ability from the multi-city environments.
HighDAN is capable of retaining the spatially topological structure of the studied urban scene well in a parallel high-to-low resolution fusion fashion.
arXiv Detail & Related papers (2023-09-26T23:55:39Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Multi-source Semantic Graph-based Multimodal Sarcasm Explanation
Generation [53.97962603641629]
We propose a novel mulTi-source sEmantic grAph-based Multimodal sarcasm explanation scheme, named TEAM.
TEAM extracts the object-level semantic meta-data instead of the traditional global visual features from the input image.
TEAM introduces a multi-source semantic graph that comprehensively characterize the multi-source semantic relations.
arXiv Detail & Related papers (2023-06-29T03:26:10Z) - Knowledge-infused Contrastive Learning for Urban Imagery-based
Socioeconomic Prediction [13.26632316765164]
Urban imagery in web like satellite/street view images has emerged as an important source for socioeconomic prediction.
We propose a Knowledge-infused Contrastive Learning model for urban imagery-based socioeconomic prediction.
Our proposed KnowCL model can apply to both satellite and street imagery with both effectiveness and transferability achieved.
arXiv Detail & Related papers (2023-02-25T14:53:17Z) - Mitigating Urban-Rural Disparities in Contrastive Representation Learning with Satellite Imagery [19.93324644519412]
We consider the risk of urban-rural disparities in identification of land-cover features.
We propose fair dense representation with contrastive learning (FairDCL) as a method for de-biasing the multi-level latent space of convolution neural network models.
The obtained image representation mitigates downstream urban-rural prediction disparities and outperforms state-of-the-art baselines on real-world satellite images.
arXiv Detail & Related papers (2022-11-16T04:59:46Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Learning Neighborhood Representation from Multi-Modal Multi-Graph:
Image, Text, Mobility Graph and Beyond [20.014906526266795]
We propose a novel approach to integrate multi-modal geotagged inputs as either node or edge features of a multi-graph.
Specifically, we use street view images and POI features to characterize neighborhoods (nodes) and use human mobility to characterize the relationship between neighborhoods (directed edges)
The embedding we trained outperforms the ones using only unimodal data as regional inputs.
arXiv Detail & Related papers (2021-05-06T07:44:05Z) - Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal
Urban Neighborhood Embedding [8.396746290518102]
Urban2Vec is an unsupervised multi-modal framework which incorporates both street view imagery and point-of-interest data.
We show that Urban2Vec can achieve performances better than baseline models and comparable to fully-supervised methods in downstream prediction tasks.
arXiv Detail & Related papers (2020-01-29T21:30:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.