Related papers: Physically Interpretable AlphaEarth Foundation Model Embeddings Enable LLM-Based Land Surface Intelligence

Physically Interpretable AlphaEarth Foundation Model Embeddings Enable LLM-Based Land Surface Intelligence

URL: http://arxiv.org/abs/2602.10354v1
Date: Tue, 10 Feb 2026 22:58:50 GMT
Title: Physically Interpretable AlphaEarth Foundation Model Embeddings Enable LLM-Based Land Surface Intelligence
Authors: Mashrekur Rahman,
Abstract summary: We present a comprehensive interpretability analysis of Google AlphaEarth's 64-dimensional embeddings against 26 environmental variables.<n>We then developed a Land Surface Intelligence system that implements retrieval-augmented generation over a FAISS-indexed embedding database of 12.1 million vectors.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Satellite foundation models produce dense embeddings whose physical interpretability remains poorly understood, limiting their integration into environmental decision systems. Using 12.1 million samples across the Continental United States (2017--2023), we first present a comprehensive interpretability analysis of Google AlphaEarth's 64-dimensional embeddings against 26 environmental variables spanning climate, vegetation, hydrology, temperature, and terrain. Combining linear, nonlinear, and attention-based methods, we show that individual embedding dimensions map onto specific land surface properties, while the full embedding space reconstructs most environmental variables with high fidelity (12 of 26 variables exceed $R^2 > 0.90$; temperature and elevation approach $R^2 = 0.97$). The strongest dimension-variable relationships converge across all three analytical methods and remain robust under spatial block cross-validation (mean $ΔR^2 = 0.017$) and temporally stable across all seven study years (mean inter-year correlation $r = 0.963$). Building on these validated interpretations, we then developed a Land Surface Intelligence system that implements retrieval-augmented generation over a FAISS-indexed embedding database of 12.1 million vectors, translating natural language environmental queries into satellite-grounded assessments. An LLM-as-Judge evaluation across 360 query--response cycles, using four LLMs in rotating generator, system, and judge roles, achieved weighted scores of $μ= 3.74 \pm 0.77$ (scale 1--5), with grounding ($μ= 3.93$) and coherence ($μ= 4.25$) as the strongest criteria. Our results demonstrate that satellite foundation model embeddings are physically structured representations that can be operationalized for environmental and geospatial intelligence.

Related papers

Inferring Height from Earth Embeddings: First insights using Google AlphaEarth [0.0]
This study investigates whether the geospatial and multimodal features encoded in textitEarth Embeddings can effectively guide deep learning (DL) regression models for regional surface height mapping.<n>We focused on AlphaEarth Embeddings at 10 m spatial resolution and evaluated their capability to support height inference using a high-quality Digital Surface Model (DSM) as reference.<n>Both architectures achieved strong training performance (both with $R2 = 0.97$), confirming that the embeddings encode informative and decodable height-related signals.
arXiv Detail & Related papers (2026-02-19T10:52:50Z)
Democratizing planetary-scale analysis: An ultra-lightweight Earth embedding database for accurate and flexible global land monitoring [19.019853798955513]
ESD is an ultra-lightweight, 30-m global Earth embedding database spanning the 25-year period from 2000 to 2024.<n>The dataset achieves a transformative 340-fold reduction in data volume compared to raw archives.<n>With robust few-shot learning capabilities and longitudinal consistency, ESD provides a versatile foundation for democratizing planetary-scale research.
arXiv Detail & Related papers (2026-01-16T10:59:43Z)
Geo3DVQA: Evaluating Vision-Language Models for 3D Geospatial Reasoning from Aerial Imagery [18.7420518276348]
Geo3DVQA is a benchmark for evaluating vision-language models (VLMs) in height-aware, 3D geospatial reasoning.<n>Unlike conventional sensor-based frameworks, Geo3DVQA emphasizes realistic scenarios that integrate elevation, sky view factors, and land cover patterns.
arXiv Detail & Related papers (2025-12-08T08:16:14Z)
OmniGround: A Comprehensive Spatio-Temporal Grounding Benchmark for Real-World Complex Scenarios [39.58602686069029]
We introduce OmniGround, a comprehensive benchmark with 3,475 videos spanning 81 categories and complex real-world queries.<n>We also introduce DeepSTG, a systematic evaluation framework quantifying dataset quality across four complementary dimensions.<n>Experiments demonstrate PG-TAF achieves 25.6% and 35.6% improvements in m_tIoU and m_vIoU with consistent gains across four benchmarks.
arXiv Detail & Related papers (2025-11-21T04:23:04Z)
Where on Earth? A Vision-Language Benchmark for Probing Model Geolocation Skills Across Scales [61.03549470159347]
Vision-language models (VLMs) have advanced rapidly, yet their capacity for image-grounded geolocation in open-world conditions has not been comprehensively evaluated.<n>We present EarthWhere, a comprehensive benchmark for VLM image geolocation that evaluates visual recognition, step-by-step reasoning, and evidence use.
arXiv Detail & Related papers (2025-10-13T01:12:21Z)
FuseTen: A Generative Model for Daily 10 m Land Surface Temperature Estimation from Spatio-Temporal Satellite Observations [3.344876133162209]
Urban heatwaves, droughts, and land heatwaves are pressing and growing challenges in the context of climate change.<n>One of the most important variables for assessing and understanding these phenomena is Land Surface Temperature (LST)<n>We propose FuseTen to produce daily LST observations at a fine 10 m spatial resolution by fusing-basedtemporal observations from Landsat 8, and Terra MODIS.
arXiv Detail & Related papers (2025-07-30T23:04:16Z)
OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data [72.98496934729245]
Existing benchmarks for multimodal learning in Earth science offer limited, siloed coverage of Earth's spheres and their cross-sphere interactions.<n>We introduce textbf OmniEarth-Bench, the first multimodal benchmark that systematically spans all six spheres.<n>Built with a scalable, modular-topology data inference framework and native multi-observation sources, OmniEarth-Bench produces 29,855 standardized, expert-curated annotations.
arXiv Detail & Related papers (2025-05-29T15:02:27Z)
OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence [51.0456395687016]
multimodal large language models (LLMs) have opened new frontiers in artificial intelligence.<n>We propose a MLLM (OmniGeo) tailored to geospatial applications.<n>By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems.
arXiv Detail & Related papers (2025-03-20T16:45:48Z)
AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities [5.767156832161819]
We propose AnySat, a multimodal model based on joint embedding predictive architecture (JEPA) and scale-adaptive spatial encoders.<n>To demonstrate the advantages of this unified approach, we compile GeoPlex, a collection of 5 multimodal datasets with varying characteristics.<n>We then train a single powerful model on these diverse datasets simultaneously.
arXiv Detail & Related papers (2024-12-18T18:11:53Z)
SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models [78.06537464850538]
We show that simulations are surprisingly effective at imparting spatial aptitudes that translate to real images.<n>We show that perfect annotations in simulation are more effective than existing approaches of pseudo-annotating real images.
arXiv Detail & Related papers (2024-12-10T18:52:45Z)
Distortions in Judged Spatial Relations in Large Language Models [45.875801135769585]
GPT-4 exhibited superior performance with 55 percent accuracy, followed by GPT-3.5 at 47 percent, and Llama-2 at 45 percent. The models identified the nearest cardinal direction in most cases, reflecting their associative learning mechanism.
arXiv Detail & Related papers (2024-01-08T20:08:04Z)
GeoLLM: Extracting Geospatial Knowledge from Large Language Models [49.20315582673223]
We present GeoLLM, a novel method that can effectively extract geospatial knowledge from large language models. We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods. Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe.
arXiv Detail & Related papers (2023-10-10T00:03:23Z)
Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer [66.82008165644892]
We propose a method to increase the accuracy of a ground camera's location and orientation by estimating the relative rotation and translation between the ground-level image and its matched/retrieved satellite image. Experimental results demonstrate that our method significantly outperforms the state-of-the-art.
arXiv Detail & Related papers (2023-07-16T11:52:27Z)
Satellite galaxy abundance dependency on cosmology in Magneticum simulations [101.18253437732933]
We build an emulator of satellite abundance based on cosmological parameters. We find that $A$ and $beta$ depend on cosmological parameters, even if weakly. We also show that satellite abundance cosmology dependency differs between full-physics (FP) simulations, dark-matter only (DMO) and non-radiative simulations.
arXiv Detail & Related papers (2021-10-11T18:00:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.