CityGPT: Empowering Urban Spatial Cognition of Large Language Models
- URL: http://arxiv.org/abs/2406.13948v1
- Date: Thu, 20 Jun 2024 02:32:16 GMT
- Title: CityGPT: Empowering Urban Spatial Cognition of Large Language Models
- Authors: Jie Feng, Yuwei Du, Tianhui Liu, Siqi Guo, Yuming Lin, Yong Li,
- Abstract summary: Large language models (LLMs) with powerful language generation and reasoning capabilities have already achieved success in many domains.
However, due to the lacking of physical world's corpus and knowledge during training, they usually fail to solve many real-life tasks in the urban space.
We propose CityGPT, a systematic framework for enhancing the capability of LLMs on understanding urban space and solving the related urban tasks.
- Score: 7.40606412920065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models(LLMs) with powerful language generation and reasoning capabilities have already achieved success in many domains, e.g., math and code generation. However, due to the lacking of physical world's corpus and knowledge during training, they usually fail to solve many real-life tasks in the urban space. In this paper, we propose CityGPT, a systematic framework for enhancing the capability of LLMs on understanding urban space and solving the related urban tasks by building a city-scale world model in the model. First, we construct a diverse instruction tuning dataset CityInstruction for injecting urban knowledge and enhancing spatial reasoning capability effectively. By using a mixture of CityInstruction and general instruction data, we fine-tune various LLMs (e.g., ChatGLM3-6B, Qwen1.5 and LLama3 series) to enhance their capability without sacrificing general abilities. To further validate the effectiveness of proposed methods, we construct a comprehensive benchmark CityEval to evaluate the capability of LLMs on diverse urban scenarios and problems. Extensive evaluation results demonstrate that small LLMs trained with CityInstruction can achieve competitive performance with commercial LLMs in the comprehensive evaluation of CityEval. The source codes are openly accessible to the research community via https://github.com/tsinghua-fib-lab/CityGPT.
Related papers
- OpenCity: A Scalable Platform to Simulate Urban Activities with Massive LLM Agents [10.919679349212426]
Large Language Models (LLMs) have led to the development of LLM agents capable of simulating urban activities with unprecedented realism.
We propose OpenCity, a scalable simulation platform optimized for both system and prompt efficiencies.
OpenCity achieves a 600-fold acceleration in simulation time per agent, a 70% reduction in LLM requests, and a 50% reduction in token usage.
arXiv Detail & Related papers (2024-10-11T13:52:35Z) - UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios [60.492736455572015]
We present UrBench, a benchmark designed for evaluating LMMs in complex multi-view urban scenarios.
UrBench contains 11.6K meticulously curated questions at both region-level and role-level.
Our evaluations on 21 LMMs show that current LMMs struggle in the urban environments in several aspects.
arXiv Detail & Related papers (2024-08-30T13:13:35Z) - CityBench: Evaluating the Capabilities of Large Language Model as World Model [10.22654338686634]
Large language models (LLMs) with powerful generalization ability have been widely used in many domains.
In this paper, we propose CityBench, an interactive simulator based evaluation platform.
We design 7 tasks in 2 categories of perception-understanding and decision-making group to evaluate the capability of LLMs as city-scale world model for urban domain.
arXiv Detail & Related papers (2024-06-20T02:25:07Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - UrbanLLM: Autonomous Urban Activity Planning and Management with Large Language Models [20.069378890478763]
UrbanLLM is a problem-solver by decomposing urban-related queries into manageable sub-tasks.
It identifies suitable AI models for each sub-task, and generates comprehensive responses to the given queries.
arXiv Detail & Related papers (2024-06-18T07:41:42Z) - Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models.
It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop.
Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z) - When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models [59.84769254832941]
We propose a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp.
Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment.
Based on FLUB, we investigate the performance of multiple representative and advanced LLMs.
arXiv Detail & Related papers (2024-02-16T22:12:53Z) - Large language model empowered participatory urban planning [5.402147437950729]
This research introduces an innovative urban planning approach integrating Large Language Models (LLMs) within the participatory process.
The framework, based on the crafted LLM agent, consists of role-play, collaborative generation, and feedback, solving a community-level land-use task catering to 1000 distinct interests.
arXiv Detail & Related papers (2024-01-24T10:50:01Z) - KoLA: Carefully Benchmarking World Knowledge of Large Language Models [87.96683299084788]
We construct a Knowledge-oriented LLM Assessment benchmark (KoLA)
We mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks.
We use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, to evaluate the capacity to handle unseen data and evolving knowledge.
arXiv Detail & Related papers (2023-06-15T17:20:46Z) - MetroGAN: Simulating Urban Morphology with Generative Adversarial
Network [10.504296192020497]
We propose a GAN framework with geographical knowledge, namely Metropolitan GAN (MetroGAN) for urban morphology simulation.
Results show that MetroGAN outperforms the state-of-the-art urban simulation methods by over 20% in all metrics.
arXiv Detail & Related papers (2022-07-06T11:02:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.