Related papers: City Foundation Models for Learning General Purpose Representations from OpenStreetMap

City Foundation Models for Learning General Purpose Representations from OpenStreetMap

URL: http://arxiv.org/abs/2310.00583v3
Date: Tue, 12 Nov 2024 06:27:51 GMT
Title: City Foundation Models for Learning General Purpose Representations from OpenStreetMap
Authors: Pasquale Balsebre, Weiming Huang, Gao Cong, Yi Li,
Abstract summary: We present CityFM, a framework to train a foundation model within a selected geographical area of interest, such as a city. CityFM relies solely on open data from OpenStreetMap, and produces multimodal representations of entities of different types, spatial, visual, and textual information. In all the experiments, CityFM achieves performance superior to, or on par with, the baselines.
Score: 16.09047066527081
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pre-trained Foundation Models (PFMs) have ushered in a paradigm-shift in Artificial Intelligence, due to their ability to learn general-purpose representations that can be readily employed in a wide range of downstream tasks. While PFMs have been successfully adopted in various fields such as Natural Language Processing and Computer Vision, their capacity in handling geospatial data and answering urban questions remains limited. This can be attributed to the intrinsic heterogeneity of geospatial data, which encompasses different data types, including points, segments and regions, as well as multiple information modalities, such as a spatial position, visual characteristics and textual annotations. The proliferation of Volunteered Geographic Information initiatives, and the ever-increasing availability of open geospatial data sources, like OpenStreetMap, which is freely accessible globally, unveil a promising opportunity to bridge this gap. In this paper, we present CityFM, a self-supervised framework to train a foundation model within a selected geographical area of interest, such as a city. CityFM relies solely on open data from OSM, and produces multimodal representations of entities of different types, incorporating spatial, visual, and textual information. We analyse the entity representations generated using our foundation models from a qualitative perspective, and conduct quantitative experiments on road, building, and region-level downstream tasks. We compare its results to algorithms tailored specifically for the respective applications. In all the experiments, CityFM achieves performance superior to, or on par with, the baselines.

Related papers

PlaceFM: A Training-free Geospatial Foundation Model of Places [0.27309692684728604]
We propose PlaceFM, a spatial foundation model that captures place representations using a training-free graph condensation method.<n>PlaceFM condenses a nationwide POI graph built from integrated Foursquare and OpenStreetMap data in the U.S., generating general-purpose embeddings of places.
arXiv Detail & Related papers (2025-06-25T15:10:31Z)
Graph Foundation Models: A Comprehensive Survey [66.74249119139661]
Graph Foundation Models (GFMs) aim to bring scalable, general-purpose intelligence to structured data.<n>This survey provides a comprehensive overview of GFMs, unifying diverse efforts under a modular framework.<n>GFMs are poised to become foundational infrastructure for open-ended reasoning over structured data.
arXiv Detail & Related papers (2025-05-21T05:08:00Z)
OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence [51.0456395687016]
multimodal large language models (LLMs) have opened new frontiers in artificial intelligence. We propose a MLLM (OmniGeo) tailored to geospatial applications. By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems.
arXiv Detail & Related papers (2025-03-20T16:45:48Z)
GeoJEPA: Towards Eliminating Augmentation- and Sampling Bias in Multimodal Geospatial Learning [0.0]
We present GeoJEPA, a versatile multimodal fusion model for geospatial data built on the self-supervised Joint-Embedding Predictive Architecture. We aim to eliminate the widely accepted augmentation- and sampling biases found in self-supervised geospatial representation learning. The results are multimodal semantic representations of urban regions and map entities that we evaluate both quantitatively and qualitatively.
arXiv Detail & Related papers (2025-02-25T22:03:28Z)
Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework. By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information. Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z)
Towards Vision-Language Geo-Foundation Model: A Survey [65.70547895998541]
Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks. This paper thoroughly reviews VLGFMs, summarizing and analyzing recent developments in the field.
arXiv Detail & Related papers (2024-06-13T17:57:30Z)
Position: Graph Foundation Models are Already Here [53.737868336014735]
Graph Foundation Models (GFMs) are emerging as a significant research topic in the graph domain. We propose a novel perspective for the GFM development by advocating for a graph vocabulary'' This perspective can potentially advance the future GFM design in line with the neural scaling laws.
arXiv Detail & Related papers (2024-02-03T17:24:36Z)
Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs [35.86744469804952]
Multimodal large language models (MLLMs) have shown remarkable capabilities across a broad range of tasks but their knowledge and abilities in the geographic and geospatial domains are yet to be explored. We conduct a series of experiments exploring various vision capabilities of MLLMs within these domains, particularly focusing on the frontier model GPT-4V. Our methodology involves challenging these models with a small-scale geographic benchmark consisting of a suite of visual tasks, testing their abilities across a spectrum of complexity.
arXiv Detail & Related papers (2023-11-24T18:46:02Z)
Chatmap : Large Language Model Interaction with Cartographic Data [0.0]
OpenStreetMap (OSM) is the most ambitious open-source global initiative offering detailed urban and rural geographic data. In this study, we demonstrate the proof of concept and details of the process of fine-tuning a relatively small scale (1B parameters) Large Language Models (LLMs) with a relatively small artificial dataset curated by a more capable teacher model. The study aims to provide an initial guideline for such generative artificial intelligence (AI) adaptations and demonstrate early signs of useful emerging abilities in this context.
arXiv Detail & Related papers (2023-09-28T15:32:36Z)
On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence [39.86997089245117]
Foundations models (FMs) can be adapted to a wide range of downstream tasks by fine-tuning, few-shot, or zero-shot learning. We propose that one of the major challenges of developing a FM for GeoAI is to address the multimodality nature of geospatial tasks.
arXiv Detail & Related papers (2023-04-13T19:50:17Z)
A General Purpose Neural Architecture for Geospatial Systems [142.43454584836812]
We present a roadmap towards the construction of a general-purpose neural architecture (GPNA) with a geospatial inductive bias. We envision how such a model may facilitate cooperation between members of the community.
arXiv Detail & Related papers (2022-11-04T09:58:57Z)
Learning Signal-Agnostic Manifolds of Neural Fields [50.066449953522685]
We leverage neural fields to capture the underlying structure in image, shape, audio and cross-modal audiovisual domains. We show that by walking across the underlying manifold of GEM, we may generate new samples in our signal domains.
arXiv Detail & Related papers (2021-11-11T18:57:40Z)
GANmapper: geographical content filling [0.0]
We present a new method to create spatial data using a generative adversarial network (GAN) Our contribution uses coarse and widely available geospatial data to create maps of less available features at the finer scale in the built environment. We employ land use data and road networks as input to generate building footprints, and conduct experiments in 9 cities around the world.
arXiv Detail & Related papers (2021-08-07T05:50:54Z)
Methodological Foundation of a Numerical Taxonomy of Urban Form [62.997667081978825]
We present a method for numerical taxonomy of urban form derived from biological systematics. We derive homogeneous urban tissue types and, by determining overall morphological similarity between them, generate a hierarchical classification of urban form. After framing and presenting the method, we test it on two cities - Prague and Amsterdam.
arXiv Detail & Related papers (2021-04-30T12:47:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.