CityGPT: Empowering Urban Spatial Cognition of Large Language Models
- URL: http://arxiv.org/abs/2406.13948v2
- Date: Sat, 31 May 2025 15:26:01 GMT
- Title: CityGPT: Empowering Urban Spatial Cognition of Large Language Models
- Authors: Jie Feng, Tianhui Liu, Yuwei Du, Siqi Guo, Yuming Lin, Yong Li,
- Abstract summary: Large language models often fall short when tackling real-life geospatial tasks within urban environments.<n>We propose textitCityGPT, a framework designed to enhance LLMs' understanding of urban space and improve their ability to solve related urban tasks.<n>To validate the effectiveness of our proposed framework, we develop a comprehensive text-based spatial benchmark textitCityEval for evaluating the performance of LLMs.
- Score: 7.40606412920065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models(LLMs), with their powerful language generation and reasoning capabilities, have already achieved notable success in many domains, e.g., math and code generation. However, they often fall short when tackling real-life geospatial tasks within urban environments. This limitation stems from a lack of physical world knowledge and relevant data during training. To address this gap, we propose \textit{CityGPT}, a systematic framework designed to enhance LLMs' understanding of urban space and improve their ability to solve the related urban tasks by integrating a city-scale `world model' into the model. Firstly, we construct a diverse instruction tuning dataset, \textit{CityInstruction}, for injecting urban knowledge into LLMs and effectively boosting their spatial reasoning capabilities. Using a combination of \textit{CityInstruction} and open source general instruction data, we introduce a novel and easy-to-use self-weighted fine-tuning method (\textit{SWFT}) to train various LLMs (including ChatGLM3-6B, Llama3-8B, and Qwen2.5-7B) to enhance their urban spatial capabilities without compromising, or even improving, their general abilities. Finally, to validate the effectiveness of our proposed framework, we develop a comprehensive text-based spatial benchmark \textit{CityEval} for evaluating the performance of LLMs across a wide range of urban scenarios and geospatial tasks. Extensive evaluation results demonstrate that smaller LLMs trained with \textit{CityInstruction} by \textit{SWFT} method can achieve performance that is competitive with, and in some cases superior to, proprietary LLMs when assessed using \textit{CityEval}.
Related papers
- UrbanMind: Towards Urban General Intelligence via Tool-Enhanced Retrieval-Augmented Generation and Multilevel Optimization [7.478830207921698]
Urban general intelligence (UGI) refers to the capacity of AI systems to autonomously perceive, reason, and act within dynamic and complex urban environments.<n>In this paper, we introduce UrbanMind, a tool-enhanced retrieval-augmented generation (RAG) framework designed to facilitate UGI.
arXiv Detail & Related papers (2025-07-07T06:57:34Z) - SpatialLLM: From Multi-modality Data to Urban Spatial Intelligence [13.810192130250744]
The core of SpatialLLM lies in constructing detailed and structured scene descriptions from raw spatial data to prompt pre-trained LLMs for scene-based analysis.<n>Extensive experiments show that, with our designs, pretrained LLMs can accurately perceive spatial distribution information.<n>We argue that multi-field knowledge, context length, and reasoning ability are key factors influencing LLM performances in urban analysis.
arXiv Detail & Related papers (2025-05-19T04:53:41Z) - UrbanMind: Urban Dynamics Prediction with Multifaceted Spatial-Temporal Large Language Models [18.051209616917042]
UrbanMind is a novel spatial-temporal LLM framework for multifaceted urban dynamics prediction.<n>At its core, UrbanMind introduces Muffin-MAE, a multifaceted fusion masked autoencoder with specialized masking strategies.<n>Experiments on real-world urban datasets across multiple cities demonstrate that UrbanMind consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-05-16T19:38:06Z) - Urban Computing in the Era of Large Language Models [41.50492781046065]
This survey explores the intersection of Large Language Models (LLMs) and urban computing.
We provide a concise overview of the evolution and core technologies of LLMs.
We survey their applications across key urban domains, such as transportation, public safety, and environmental monitoring.
arXiv Detail & Related papers (2025-04-02T05:12:13Z) - LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications.<n>Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z) - Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation [55.21013307734612]
AoPS-Instruct is a dataset of more than 600,000 high-quality QA pairs.<n>LiveAoPSBench is an evolving evaluation set with timestamps, derived from the latest forum data.<n>Our work presents a scalable approach to creating and maintaining large-scale, high-quality datasets for advanced math reasoning.
arXiv Detail & Related papers (2025-01-24T06:39:38Z) - What can LLM tell us about cities? [6.405546719612814]
This study explores the capabilities of large language models (LLMs) in providing knowledge about cities and regions on a global scale.
Experiments reveal that LLMs embed a broad but varying degree of knowledge across global cities, with ML models trained on LLM-derived features consistently leading to improved predictive accuracy.
arXiv Detail & Related papers (2024-11-25T09:07:56Z) - OpenCity: A Scalable Platform to Simulate Urban Activities with Massive LLM Agents [10.919679349212426]
Large Language Models (LLMs) have led to the development of LLM agents capable of simulating urban activities with unprecedented realism.
We propose OpenCity, a scalable simulation platform optimized for both system and prompt efficiencies.
OpenCity achieves a 600-fold acceleration in simulation time per agent, a 70% reduction in LLM requests, and a 50% reduction in token usage.
arXiv Detail & Related papers (2024-10-11T13:52:35Z) - EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.<n>We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.<n>Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z) - UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios [60.492736455572015]
We present UrBench, a benchmark designed for evaluating LMMs in complex multi-view urban scenarios.
UrBench contains 11.6K meticulously curated questions at both region-level and role-level.
Our evaluations on 21 LMMs show that current LMMs struggle in the urban environments in several aspects.
arXiv Detail & Related papers (2024-08-30T13:13:35Z) - CityBench: Evaluating the Capabilities of Large Language Model as World Model [10.22654338686634]
Large language models (LLMs) with powerful generalization ability have been widely used in many domains.
In this paper, we propose CityBench, an interactive simulator based evaluation platform.
We design 7 tasks in 2 categories of perception-understanding and decision-making group to evaluate the capability of LLMs as city-scale world model for urban domain.
arXiv Detail & Related papers (2024-06-20T02:25:07Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - UrbanLLM: Autonomous Urban Activity Planning and Management with Large Language Models [20.069378890478763]
UrbanLLM is a problem-solver by decomposing urban-related queries into manageable sub-tasks.
It identifies suitable AI models for each sub-task, and generates comprehensive responses to the given queries.
arXiv Detail & Related papers (2024-06-18T07:41:42Z) - Efficient Prompting for LLM-based Generative Internet of Things [88.84327500311464]
Large language models (LLMs) have demonstrated remarkable capacities on various tasks, and integrating the capacities of LLMs into the Internet of Things (IoT) applications has drawn much research attention recently.
Due to security concerns, many institutions avoid accessing state-of-the-art commercial LLM services, requiring the deployment and utilization of open-source LLMs in a local network setting.
We propose a LLM-based Generative IoT (GIoT) system deployed in the local network setting in this study.
arXiv Detail & Related papers (2024-06-14T19:24:00Z) - Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models.
It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop.
Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z) - When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models [59.84769254832941]
We propose a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp.
Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment.
Based on FLUB, we investigate the performance of multiple representative and advanced LLMs.
arXiv Detail & Related papers (2024-02-16T22:12:53Z) - Large language model empowered participatory urban planning [5.402147437950729]
This research introduces an innovative urban planning approach integrating Large Language Models (LLMs) within the participatory process.
The framework, based on the crafted LLM agent, consists of role-play, collaborative generation, and feedback, solving a community-level land-use task catering to 1000 distinct interests.
arXiv Detail & Related papers (2024-01-24T10:50:01Z) - KoLA: Carefully Benchmarking World Knowledge of Large Language Models [87.96683299084788]
We construct a Knowledge-oriented LLM Assessment benchmark (KoLA)
We mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks.
We use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, to evaluate the capacity to handle unseen data and evolving knowledge.
arXiv Detail & Related papers (2023-06-15T17:20:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.