Scalable Geospatial Data Generation Using AlphaEarth Foundations Model
- URL: http://arxiv.org/abs/2508.11739v1
- Date: Fri, 15 Aug 2025 17:09:48 GMT
- Title: Scalable Geospatial Data Generation Using AlphaEarth Foundations Model
- Authors: Luc Houriez, Sebastian Pilarski, Behzad Vahedi, Ali Ahmadalipour, Teo Honda Scully, Nicholas Aflitto, David Andre, Caroline Jaffe, Martha Wedner, Rich Mazzola, Josh Jeffery, Ben Messinger, Sage McGinley-Smith, Sarah Russell,
- Abstract summary: We propose and evaluate a methodology which leverages Google DeepMind's AlphaEarth Foundations (AEF) to extend geospatial labeled datasets beyond their initial geographic regions.<n>We show that even basic models like random forests or logistic regression can be used to accomplish this task.<n>We investigate a case study of extending LANDFIRE's Existing Vegetation Type (EVT) dataset beyond the USA into Canada at two levels of granularity: EvtPhys (13 classes) and EvtGp (80 classes)
- Score: 0.1775251182905249
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-quality labeled geospatial datasets are essential for extracting insights and understanding our planet. Unfortunately, these datasets often do not span the entire globe and are limited to certain geographic regions where data was collected. Google DeepMind's recently released AlphaEarth Foundations (AEF) provides an information-dense global geospatial representation designed to serve as a useful input across a wide gamut of tasks. In this article we propose and evaluate a methodology which leverages AEF to extend geospatial labeled datasets beyond their initial geographic regions. We show that even basic models like random forests or logistic regression can be used to accomplish this task. We investigate a case study of extending LANDFIRE's Existing Vegetation Type (EVT) dataset beyond the USA into Canada at two levels of granularity: EvtPhys (13 classes) and EvtGp (80 classes). Qualitatively, for EvtPhys, model predictions align with ground truth. Trained models achieve 81% and 73% classification accuracy on EvtPhys validation sets in the USA and Canada, despite discussed limitations.
Related papers
- GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics [91.17301794848025]
This paper presents GeoAgent, a model capable of reasoning closely with humans and deriving fine-grained address conclusions.<n>Previous RL-based methods have achieved breakthroughs in performance and interpretability but still remain concerns because of their reliance on AI-generated chain-of-thought (CoT) data and training strategies.
arXiv Detail & Related papers (2026-02-13T04:48:05Z) - GEO-Bench-2: From Performance to Capability, Rethinking Evaluation in Geospatial AI [52.13138825802668]
GeoFMs are transforming Earth Observation, but evaluation lacks standardized protocols.<n> GEO-Bench-2 addresses this with a comprehensive framework spanning classification, segmentation, regression, object detection, and instance segmentation.<n>Code, data, and leaderboard for GEO-Bench-2 are publicly released under a permissive license.
arXiv Detail & Related papers (2025-11-19T17:45:02Z) - GeoBS: Information-Theoretic Quantification of Geographic Bias in AI Models [34.611626290720295]
We establish an information-theoretic framework for geo-bias evaluation, called GeoBS (Geo-Bias Scores)<n>We propose three novel geo-bias scores that explicitly take intricate spatial factors into consideration.
arXiv Detail & Related papers (2025-09-27T20:07:21Z) - AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data [18.7927140265097]
We introduce AlphaEarth Foundations, an embedding field model yielding a highly general, geospatial representation.<n>We will release a dataset of global, annual, analysis-ready embedding field layers from 2017 through 2024.
arXiv Detail & Related papers (2025-07-29T23:55:00Z) - PlaceFM: A Training-free Geospatial Foundation Model of Places using Large-Scale Point of Interest Data [0.5735035463793009]
PlaceFM captures place representations through a training-free, clustering-based approach.<n>placeFM summarizes the entire point of interest graph constructed from U.S. Foursquare data.<n>placeFM produces general-purpose region embeddings while automatically identifying places of interest.<n>placeFM achieves up to a 100x speedup in generating region-level representations on large-scale POI graphs.
arXiv Detail & Related papers (2025-06-25T15:10:31Z) - HyBiomass: Global Hyperspectral Imagery Benchmark Dataset for Evaluating Geospatial Foundation Models in Forest Aboveground Biomass Estimation [1.0408909053766147]
We introduce a globally distributed benchmark dataset for forest aboveground biomass (AGB) estimation.<n>This benchmark dataset combines co-located hyperspectral imagery (HSI) from the Environmental Mapping and Analysis Program (EnMAP) satellite and predictions of AGB density estimates.<n>Our experimental results on this dataset demonstrate that the evaluated Geo-FMs can match or, in some cases, surpass the performance of a baseline U-Net.
arXiv Detail & Related papers (2025-06-12T21:29:20Z) - GeoGrid-Bench: Can Foundation Models Understand Multimodal Gridded Geo-Spatial Data? [5.017671236021897]
GeoGrid-Bench is a benchmark designed to evaluate the ability of foundation models to understand geo-spatial data in the grid structure.<n>This benchmark features large-scale, real-world data covering 16 climate variables across 150 locations and extended time frames.
arXiv Detail & Related papers (2025-05-15T21:31:44Z) - Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework [59.42946541163632]
We introduce a comprehensive geolocation framework with three key components.<n>GeoComp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, an evaluation metric.<n>We demonstrate that GeoCoT significantly boosts geolocation accuracy by up to 25% while enhancing interpretability.
arXiv Detail & Related papers (2025-02-19T14:21:25Z) - GeoGalactica: A Scientific Large Language Model in Geoscience [95.15911521220052]
Large language models (LLMs) have achieved huge success for their general knowledge and ability to solve a wide spectrum of tasks in natural language processing (NLP)
We specialize an LLM into geoscience, by further pre-training the model with a vast amount of texts in geoscience, as well as supervised fine-tuning (SFT) the resulting model with our custom collected instruction tuning dataset.
We train GeoGalactica over a geoscience-related text corpus containing 65 billion tokens, preserving as the largest geoscience-specific text corpus.
Then we fine-tune the model with 1 million pairs of instruction-tuning
arXiv Detail & Related papers (2023-12-31T09:22:54Z) - GeoLLM: Extracting Geospatial Knowledge from Large Language Models [49.20315582673223]
We present GeoLLM, a novel method that can effectively extract geospatial knowledge from large language models.
We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods.
Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe.
arXiv Detail & Related papers (2023-10-10T00:03:23Z) - Assessment of a new GeoAI foundation model for flood inundation mapping [4.312965283062856]
This paper evaluates the performance of the first-of-its-kind geospatial foundation model, IBM-NASA's Prithvi, to support a crucial geospatial analysis task: flood inundation mapping.
A benchmark dataset, Sen1Floods11, is used in the experiments, and the models' predictability, generalizability, and transferability are evaluated.
Results show the good transferability of the Prithvi model, highlighting its performance advantages in segmenting flooded areas in previously unseen regions.
arXiv Detail & Related papers (2023-09-25T19:50:47Z) - K2: A Foundation Language Model for Geoscience Knowledge Understanding
and Utilization [105.89544876731942]
Large language models (LLMs) have achieved great success in general domains of natural language processing.
We present the first-ever LLM in geoscience, K2, alongside a suite of resources developed to further promote LLM research within geoscience.
arXiv Detail & Related papers (2023-06-08T09:29:05Z) - AutoGeoLabel: Automated Label Generation for Geospatial Machine Learning [69.47585818994959]
We evaluate a big data processing pipeline to auto-generate labels for remote sensing data.
We utilize the big geo-data platform IBM PAIRS to dynamically generate such labels in dense urban areas.
arXiv Detail & Related papers (2022-01-31T20:02:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.