Related papers: MuseCL: Predicting Urban Socioeconomic Indicators via Multi-Semantic Contrastive Learning

MuseCL: Predicting Urban Socioeconomic Indicators via Multi-Semantic Contrastive Learning

URL: http://arxiv.org/abs/2407.09523v1
Date: Sun, 23 Jun 2024 09:49:41 GMT
Title: MuseCL: Predicting Urban Socioeconomic Indicators via Multi-Semantic Contrastive Learning
Authors: Xixian Yong, Xiao Zhou,
Abstract summary: MuseCL is a framework for fine-grained urban region profiling and socioeconomic prediction. We construct contrastive sample pairs for street view and remote sensing images, capitalizing on similarities in human mobility. We extract semantic insights from POI texts embedded within these regions, employing a pre-trained text encoder.
Score: 13.681538916025021
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Predicting socioeconomic indicators within urban regions is crucial for fostering inclusivity, resilience, and sustainability in cities and human settlements. While pioneering studies have attempted to leverage multi-modal data for socioeconomic prediction, jointly exploring their underlying semantics remains a significant challenge. To address the gap, this paper introduces a Multi-Semantic Contrastive Learning (MuseCL) framework for fine-grained urban region profiling and socioeconomic prediction. Within this framework, we initiate the process by constructing contrastive sample pairs for street view and remote sensing images, capitalizing on the similarities in human mobility and Point of Interest (POI) distribution to derive semantic features from the visual modality. Additionally, we extract semantic insights from POI texts embedded within these regions, employing a pre-trained text encoder. To merge the acquired visual and textual features, we devise an innovative cross-modality-based attentional fusion module, which leverages a contrastive mechanism for integration. Experimental results across multiple cities and indicators consistently highlight the superiority of MuseCL, demonstrating an average improvement of 10% in $R^2$ compared to various competitive baseline models. The code of this work is publicly available at https://github.com/XixianYong/MuseCL.

Related papers

Invisible Walls in Cities: Leveraging Large Language Models to Predict Urban Segregation Experience with Social Media Content [21.34394159491078]
We propose using Large Language Models to automate online review mining for segregation prediction. We produce a codebook capturing key dimensions that signal segregation experience, such as cultural resonance and appeal. Our framework greatly improves prediction accuracy, with a 22.79% elevation in R2 and a 9.33% reduction in MSE.
arXiv Detail & Related papers (2025-02-17T09:52:17Z)
SCALE: Towards Collaborative Content Analysis in Social Science with Large Language Model Agents and Human Intervention [50.07342730395946]
We introduce a novel multi-agent framework that effectively. imulates $underlinetextbfC$ontent $underlinetextbfA$nalysis via. underlinetextbfL$arge language model (LLM) agunderlinetextbfE$nts. It imitates key phases of content analysis, including text coding, collaborative discussion, and dynamic codebook evolution.
arXiv Detail & Related papers (2025-02-16T00:19:07Z)
Multimodal Contrastive Learning of Urban Space Representations from POI Data [2.695321027513952]
CaLLiPer (Contrastive Language-Location Pre-training) is a representation learning model that embeds continuous urban spaces into vector representations. We validate CaLLiPer's effectiveness by applying it to learning urban space representations in London, UK.
arXiv Detail & Related papers (2024-11-09T16:24:07Z)
Federated Contrastive Learning for Personalized Semantic Communication [55.46383524190467]
We design a federated contrastive learning framework aimed at supporting personalized semantic communication. FedCL enables collaborative training of local semantic encoders across multiple clients and a global semantic decoder owned by the base station. To tackle the semantic imbalance issue arising from heterogeneous datasets across distributed clients, we employ contrastive learning to train a semantic centroid generator.
arXiv Detail & Related papers (2024-06-13T14:45:35Z)
Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models [58.58594658683919]
Large multimodal models (LMMs) have shown transformative potential across various research tasks. Our findings indicate LMMs possess advantages in zero-shot learning, interpretability, and handling uncurated 'in-the-wild' inputs. We propose a Chain-of-Thought augmented prompting approach, which effectively mitigates the off-target prediction issue.
arXiv Detail & Related papers (2024-05-24T16:26:56Z)
UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Socioeconomic Indicator Prediction [26.693692853787756]
Urban socioeconomic indicator prediction aims to infer various metrics related to sustainable development in diverse urban landscapes. Pretrained models, particularly those reliant on satellite imagery, face dual challenges.
arXiv Detail & Related papers (2024-03-25T14:57:18Z)
Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for Cross-City Semantic Segmentation using High-Resolution Domain Adaptation Networks [82.82866901799565]
We build a new set of multimodal remote sensing benchmark datasets (including hyperspectral, multispectral, SAR) for the study purpose of the cross-city semantic segmentation task. Beyond the single city, we propose a high-resolution domain adaptation network, HighDAN, to promote the AI model's generalization ability from the multi-city environments. HighDAN is capable of retaining the spatially topological structure of the studied urban scene well in a parallel high-to-low resolution fusion fashion.
arXiv Detail & Related papers (2023-09-26T23:55:39Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation [53.97962603641629]
We propose a novel mulTi-source sEmantic grAph-based Multimodal sarcasm explanation scheme, named TEAM. TEAM extracts the object-level semantic meta-data instead of the traditional global visual features from the input image. TEAM introduces a multi-source semantic graph that comprehensively characterize the multi-source semantic relations.
arXiv Detail & Related papers (2023-06-29T03:26:10Z)
Knowledge-infused Contrastive Learning for Urban Imagery-based Socioeconomic Prediction [13.26632316765164]
Urban imagery in web like satellite/street view images has emerged as an important source for socioeconomic prediction. We propose a Knowledge-infused Contrastive Learning model for urban imagery-based socioeconomic prediction. Our proposed KnowCL model can apply to both satellite and street imagery with both effectiveness and transferability achieved.
arXiv Detail & Related papers (2023-02-25T14:53:17Z)
Mitigating Urban-Rural Disparities in Contrastive Representation Learning with Satellite Imagery [19.93324644519412]
We consider the risk of urban-rural disparities in identification of land-cover features. We propose fair dense representation with contrastive learning (FairDCL) as a method for de-biasing the multi-level latent space of convolution neural network models. The obtained image representation mitigates downstream urban-rural prediction disparities and outperforms state-of-the-art baselines on real-world satellite images.
arXiv Detail & Related papers (2022-11-16T04:59:46Z)
Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text. These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining. We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z)
Learning Neighborhood Representation from Multi-Modal Multi-Graph: Image, Text, Mobility Graph and Beyond [20.014906526266795]
We propose a novel approach to integrate multi-modal geotagged inputs as either node or edge features of a multi-graph. Specifically, we use street view images and POI features to characterize neighborhoods (nodes) and use human mobility to characterize the relationship between neighborhoods (directed edges) The embedding we trained outperforms the ones using only unimodal data as regional inputs.
arXiv Detail & Related papers (2021-05-06T07:44:05Z)
Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding [8.396746290518102]
Urban2Vec is an unsupervised multi-modal framework which incorporates both street view imagery and point-of-interest data. We show that Urban2Vec can achieve performances better than baseline models and comparable to fully-supervised methods in downstream prediction tasks.
arXiv Detail & Related papers (2020-01-29T21:30:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.