Urban-R1: Reinforced MLLMs Mitigate Geospatial Biases for Urban General Intelligence
- URL: http://arxiv.org/abs/2510.16555v1
- Date: Sat, 18 Oct 2025 15:59:09 GMT
- Title: Urban-R1: Reinforced MLLMs Mitigate Geospatial Biases for Urban General Intelligence
- Authors: Qiongyan Wang, Xingchen Zou, Yutian Jiang, Haomin Wen, Jiaheng Wei, Qingsong Wen, Yuxuan Liang,
- Abstract summary: Urban General Intelligence (UGI) refers to AI systems that can understand and reason about complex urban environments.<n>Recent studies have built urban foundation models using supervised fine-tuning (SFT) of LLMs and MLLMs.<n>We propose Urban-R1, a reinforcement learning-based post-training framework that aligns MLLMs with the objectives of UGI.
- Score: 64.36291202666212
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Rapid urbanization intensifies the demand for Urban General Intelligence (UGI), referring to AI systems that can understand and reason about complex urban environments. Recent studies have built urban foundation models using supervised fine-tuning (SFT) of LLMs and MLLMs, yet these models exhibit persistent geospatial bias, producing regionally skewed predictions and limited generalization. To this end, we propose Urban-R1, a reinforcement learning-based post-training framework that aligns MLLMs with the objectives of UGI. Urban-R1 adopts Group Relative Policy Optimization (GRPO) to optimize reasoning across geographic groups and employs urban region profiling as a proxy task to provide measurable rewards from multimodal urban data. Extensive experiments across diverse regions and tasks show that Urban-R1 effectively mitigates geo-bias and improves cross-region generalization, outperforming both SFT-trained and closed-source models. Our results highlight reinforcement learning alignment as a promising pathway toward equitable and trustworthy urban intelligence.
Related papers
- UrbanVerse: Learning Urban Region Representation Across Cities and Tasks [18.711357897379283]
UrbanVerse is a model for cross-city urban representation learning and cross-task urban analytics.<n>For cross-city generalization, UrbanVerse focuses on features local to the target regions and structural features of the nearby regions rather than the entire city.<n>Experiments on real-world datasets show that UrbanVerse consistently outperforms state-of-the-art methods across six tasks under cross-city settings.
arXiv Detail & Related papers (2026-02-17T17:28:48Z) - UrbanMoE: A Sparse Multi-Modal Mixture-of-Experts Framework for Multi-Task Urban Region Profiling [47.568568425459716]
We develop a benchmark for multi-task urban region profiling, featuring multi-modal features and a diverse set of strong baselines.<n>We then propose UrbanMoE, the first sparse multi-modal, multi-expert framework specifically architected to solve the multi-task challenge.<n>We conduct extensive experiments on three real-world datasets within our benchmark, where UrbanMoE consistently demonstrates superior performance over all baselines.
arXiv Detail & Related papers (2026-01-30T09:25:05Z) - AtlasGS: Atlanta-world Guided Surface Reconstruction with Implicit Structured Gaussians [57.95719610327081]
We propose an Atlanta-world guided implicit-structured Gaussian Splatting that achieves smooth indoor and urban scene reconstruction.<n>Our method outperforms state-of-the-art approaches in both indoor and urban scenes, delivering superior surface reconstruction quality.
arXiv Detail & Related papers (2025-10-29T03:17:58Z) - MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization [103.74675519953898]
Long-chain reflective reasoning is a prerequisite for solving complex real-world problems.<n>We build a benchmark consisting 1,260 samples of 42 challenging synthetic tasks.<n>We generate post-training data and explore learning paradigms for exploiting such data.
arXiv Detail & Related papers (2025-10-09T17:53:58Z) - UrbanMind: Towards Urban General Intelligence via Tool-Enhanced Retrieval-Augmented Generation and Multilevel Optimization [7.478830207921698]
Urban general intelligence (UGI) refers to the capacity of AI systems to autonomously perceive, reason, and act within dynamic and complex urban environments.<n>In this paper, we introduce UrbanMind, a tool-enhanced retrieval-augmented generation (RAG) framework designed to facilitate UGI.
arXiv Detail & Related papers (2025-07-07T06:57:34Z) - UrbanMind: Urban Dynamics Prediction with Multifaceted Spatial-Temporal Large Language Models [18.051209616917042]
UrbanMind is a novel spatial-temporal LLM framework for multifaceted urban dynamics prediction.<n>At its core, UrbanMind introduces Muffin-MAE, a multifaceted fusion masked autoencoder with specialized masking strategies.<n>Experiments on real-world urban datasets across multiple cities demonstrate that UrbanMind consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-05-16T19:38:06Z) - Collaborative Imputation of Urban Time Series through Cross-city Meta-learning [54.438991949772145]
We propose a novel collaborative imputation paradigm leveraging meta-learned implicit neural representations (INRs)<n>We then introduce a cross-city collaborative learning scheme through model-agnostic meta learning.<n>Experiments on a diverse urban dataset from 20 global cities demonstrate our model's superior imputation performance and generalizability.
arXiv Detail & Related papers (2025-01-20T07:12:40Z) - CityGPT: Empowering Urban Spatial Cognition of Large Language Models [7.40606412920065]
Large language models often fall short when tackling real-life geospatial tasks within urban environments.<n>We propose textitCityGPT, a framework designed to enhance LLMs' understanding of urban space and improve their ability to solve related urban tasks.<n>To validate the effectiveness of our proposed framework, we develop a comprehensive text-based spatial benchmark textitCityEval for evaluating the performance of LLMs.
arXiv Detail & Related papers (2024-06-20T02:32:16Z) - Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for
Cross-City Semantic Segmentation using High-Resolution Domain Adaptation
Networks [82.82866901799565]
We build a new set of multimodal remote sensing benchmark datasets (including hyperspectral, multispectral, SAR) for the study purpose of the cross-city semantic segmentation task.
Beyond the single city, we propose a high-resolution domain adaptation network, HighDAN, to promote the AI model's generalization ability from the multi-city environments.
HighDAN is capable of retaining the spatially topological structure of the studied urban scene well in a parallel high-to-low resolution fusion fashion.
arXiv Detail & Related papers (2023-09-26T23:55:39Z) - A Contextual Master-Slave Framework on Urban Region Graph for Urban
Village Detection [68.84486900183853]
We build an urban region graph (URG) to model the urban area in a hierarchically structured way.
Then, we design a novel contextual master-slave framework to effectively detect the urban village from the URG.
The proposed framework can learn to balance the generality and specificity for UV detection in an urban area.
arXiv Detail & Related papers (2022-11-26T18:17:39Z) - MetroGAN: Simulating Urban Morphology with Generative Adversarial
Network [10.504296192020497]
We propose a GAN framework with geographical knowledge, namely Metropolitan GAN (MetroGAN) for urban morphology simulation.
Results show that MetroGAN outperforms the state-of-the-art urban simulation methods by over 20% in all metrics.
arXiv Detail & Related papers (2022-07-06T11:02:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.