CityBench: Evaluating the Capabilities of Large Language Models for Urban Tasks
- URL: http://arxiv.org/abs/2406.13945v2
- Date: Mon, 23 Dec 2024 14:10:09 GMT
- Title: CityBench: Evaluating the Capabilities of Large Language Models for Urban Tasks
- Authors: Jie Feng, Jun Zhang, Tianhui Liu, Xin Zhang, Tianjian Ouyang, Junbo Yan, Yuwei Du, Siqi Guo, Yong Li,
- Abstract summary: Large language models (LLMs) with extensive general knowledge and powerful reasoning abilities have seen rapid development and widespread application.
In this paper, we design CityBench, an interactive simulator based evaluation platform.
We design 8 representative urban tasks in 2 categories of perception-understanding and decision-making as the CityBench.
- Score: 10.22654338686634
- License:
- Abstract: Recently, large language models (LLMs) with extensive general knowledge and powerful reasoning abilities have seen rapid development and widespread application. A systematic and reliable evaluation of LLMs or vision-language model (VLMs) is a crucial step in applying and developing them for various fields. There have been some early explorations about the usability of LLMs for limited urban tasks, but a systematic and scalable evaluation benchmark is still lacking. The challenge in constructing a systematic evaluation benchmark for urban research lies in the diversity of urban data, the complexity of application scenarios and the highly dynamic nature of the urban environment. In this paper, we design CityBench, an interactive simulator based evaluation platform, as the first systematic benchmark for evaluating the capabilities of LLMs for diverse tasks in urban research. First, we build CityData to integrate the diverse urban data and CitySimu to simulate fine-grained urban dynamics. Based on CityData and CitySimu, we design 8 representative urban tasks in 2 categories of perception-understanding and decision-making as the CityBench. With extensive results from 30 well-known LLMs and VLMs in 13 cities around the world, we find that advanced LLMs and VLMs can achieve competitive performance in diverse urban tasks requiring commonsense and semantic understanding abilities, e.g., understanding the human dynamics and semantic inference of urban images. Meanwhile, they fail to solve the challenging urban tasks requiring professional knowledge and high-level reasoning abilities, e.g., geospatial prediction and traffic control task. These observations provide valuable perspectives for utilizing and developing LLMs in the future. Codes are openly accessible via https://github.com/tsinghua-fib-lab/CityBench.
Related papers
- Collaborative Imputation of Urban Time Series through Cross-city Meta-learning [54.438991949772145]
We propose a novel collaborative imputation paradigm leveraging meta-learned implicit neural representations (INRs)
We then introduce a cross-city collaborative learning scheme through model-agnostic meta learning.
Experiments on a diverse urban dataset from 20 global cities demonstrate our model's superior imputation performance and generalizability.
arXiv Detail & Related papers (2025-01-20T07:12:40Z) - VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks [100.3234156027118]
We present VLABench, an open-source benchmark for evaluating universal LCM task learning.
VLABench provides 100 carefully designed categories of tasks, with strong randomization in each category of task and a total of 2000+ objects.
The benchmark assesses multiple competencies including understanding of mesh&texture, spatial relationship, semantic instruction, physical laws, knowledge transfer and reasoning.
arXiv Detail & Related papers (2024-12-24T06:03:42Z) - UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios [60.492736455572015]
We present UrBench, a benchmark designed for evaluating LMMs in complex multi-view urban scenarios.
UrBench contains 11.6K meticulously curated questions at both region-level and role-level.
Our evaluations on 21 LMMs show that current LMMs struggle in the urban environments in several aspects.
arXiv Detail & Related papers (2024-08-30T13:13:35Z) - CityGPT: Empowering Urban Spatial Cognition of Large Language Models [7.40606412920065]
Large language models (LLMs) with powerful language generation and reasoning capabilities have already achieved success in many domains.
However, due to the lacking of physical world's corpus and knowledge during training, they usually fail to solve many real-life tasks in the urban space.
We propose CityGPT, a systematic framework for enhancing the capability of LLMs on understanding urban space and solving the related urban tasks.
arXiv Detail & Related papers (2024-06-20T02:32:16Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - UrbanLLM: Autonomous Urban Activity Planning and Management with Large Language Models [20.069378890478763]
UrbanLLM is a problem-solver by decomposing urban-related queries into manageable sub-tasks.
It identifies suitable AI models for each sub-task, and generates comprehensive responses to the given queries.
arXiv Detail & Related papers (2024-06-18T07:41:42Z) - Urban Generative Intelligence (UGI): A Foundational Platform for Agents
in Embodied City Environment [32.53845672285722]
Urban environments, characterized by their complex, multi-layered networks, face significant challenges in the face of rapid urbanization.
Recent developments in big data, artificial intelligence, urban computing, and digital twins have laid the groundwork for sophisticated city modeling and simulation.
This paper proposes Urban Generative Intelligence (UGI), a novel foundational platform integrating Large Language Models (LLMs) into urban systems.
arXiv Detail & Related papers (2023-12-19T03:12:13Z) - Unified Data Management and Comprehensive Performance Evaluation for
Urban Spatial-Temporal Prediction [Experiment, Analysis & Benchmark] [78.05103666987655]
This work addresses challenges in accessing and utilizing diverse urban spatial-temporal datasets.
We introduceatomic files, a unified storage format designed for urban spatial-temporal big data, and validate its effectiveness on 40 diverse datasets.
We conduct extensive experiments using diverse models and datasets, establishing a performance leaderboard and identifying promising research directions.
arXiv Detail & Related papers (2023-08-24T16:20:00Z) - The Urban Toolkit: A Grammar-based Framework for Urban Visual Analytics [5.674216760436341]
The complex nature of urban issues and the overwhelming amount of available data have posed significant challenges in translating these efforts into actionable insights.
When analyzing a feature of interest, an urban expert must transform, integrate, and visualize different thematic (e.g., sunlight access, demographic) and physical (e.g., buildings, street networks) data layers.
This makes the entire visual data exploration and system implementation difficult for programmers and also sets a high entry barrier for urban experts outside of computer science.
arXiv Detail & Related papers (2023-08-15T13:43:04Z) - Methodological Foundation of a Numerical Taxonomy of Urban Form [62.997667081978825]
We present a method for numerical taxonomy of urban form derived from biological systematics.
We derive homogeneous urban tissue types and, by determining overall morphological similarity between them, generate a hierarchical classification of urban form.
After framing and presenting the method, we test it on two cities - Prague and Amsterdam.
arXiv Detail & Related papers (2021-04-30T12:47:52Z) - City limits in the age of smartphones and urban scaling [0.0]
Urban planning still lacks appropriate standards to define city boundaries across urban systems.
ICT provide the potential to portray more accurate descriptions of the urban systems.
We apply computational techniques over a large volume of mobile phone records to define urban boundaries.
arXiv Detail & Related papers (2020-05-06T17:31:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.