Related papers: Inferring Height from Earth Embeddings: First insights using Google AlphaEarth

Inferring Height from Earth Embeddings: First insights using Google AlphaEarth

URL: http://arxiv.org/abs/2602.17250v1
Date: Thu, 19 Feb 2026 10:52:50 GMT
Title: Inferring Height from Earth Embeddings: First insights using Google AlphaEarth
Authors: Alireza Hamoudzadeh, Valeria Belloni, Roberta Ravanelli,
Abstract summary: This study investigates whether the geospatial and multimodal features encoded in textitEarth Embeddings can effectively guide deep learning (DL) regression models for regional surface height mapping.<n>We focused on AlphaEarth Embeddings at 10 m spatial resolution and evaluated their capability to support height inference using a high-quality Digital Surface Model (DSM) as reference.<n>Both architectures achieved strong training performance (both with $R2 = 0.97$), confirming that the embeddings encode informative and decodable height-related signals.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study investigates whether the geospatial and multimodal features encoded in \textit{Earth Embeddings} can effectively guide deep learning (DL) regression models for regional surface height mapping. In particular, we focused on AlphaEarth Embeddings at 10 m spatial resolution and evaluated their capability to support terrain height inference using a high-quality Digital Surface Model (DSM) as reference. U-Net and U-Net++ architectures were thus employed as lightweight convolutional decoders to assess how well the geospatial information distilled in the embeddings can be translated into accurate surface height estimates. Both architectures achieved strong training performance (both with $R^2 = 0.97$), confirming that the embeddings encode informative and decodable height-related signals. On the test set, performance decreased due to distribution shifts in height frequency between training and testing areas. Nevertheless, U-Net++ shows better generalization ($R^2 = 0.84$, median difference = -2.62 m) compared with the standard U-Net ($R^2 = 0.78$, median difference = -7.22 m), suggesting enhanced robustness to distribution mismatch. While the testing RMSE (approximately 16 m for U-Net++) and residual bias highlight remaining challenges in generalization, strong correlations indicate that the embeddings capture transferable topographic patterns. Overall, the results demonstrate the promising potential of AlphaEarth Embeddings to guide DL-based height mapping workflows, particularly when combined with spatially aware convolutional architectures, while emphasizing the need to address bias for improved regional transferability.

Related papers

Cross-Layer Attentive Feature Upsampling for Low-latency Semantic Segmentation [52.01210390327581]
We propose Guided Attentive Interpolation (GAI) to adaptively interpolate fine-grained high-resolution features with semantic features.<n>GAI determines both spatial and semantic relations of pixels from features of different resolutions and then leverages these relations to interpolate high-resolution features with rich semantics.<n>In experiments, the GAI-based semantic segmentation networks, i.e., GAIN, can achieve78.8 mIoU with 22.3 FPS on Cityscapes and 80.6 mIoU with 64.5 on CamVid using an NVIDIA 1080Ti GPU.
arXiv Detail & Related papers (2026-01-03T12:09:49Z)
SERA-H: Beyond Native Sentinel Spatial Limits for High-Resolution Canopy Height Mapping [3.8902217877872034]
High-resolution mapping of canopy height is essential for forest management and biodiversity monitoring.<n>We present SERA-H, an end-to-end model combining a super-resolution module and temporal attention encoding.<n>Our model generates 2.5 m resolution height maps from freely available Sentinel-1 and Sentinel-2 time series data.
arXiv Detail & Related papers (2025-12-19T23:23:14Z)
Geodiffussr: Generative Terrain Texturing with Elevation Fidelity [48.82552523546255]
We introduce Geodiffussr, a flow-matching pipeline that synthesizes text-guided texture maps.<n>The core mechanism is multi-scale content aggregation (MCA): DEM features are injected into UNet blocks at multiple resolutions to enforce global-to-local elevation consistency.<n>To train and evaluate Geodiffussr, we assemble a globally distributed, biome- and climate-stratified corpus of triplets pairing SRTM-derived DEMs with Sentinel-2 imagery and vision-grounded natural-appearance captions.
arXiv Detail & Related papers (2025-11-28T09:52:44Z)
Generative MIMO Beam Map Construction for Location Recovery and Beam Tracking [67.65578956523403]
This paper proposes a generative framework to recover location labels directly from sparse channel state information (CSI) measurements.<n>Instead of directly storing raw CSI, we learn a compact low-dimensional radio map embedding and leverage a generative model to reconstruct the high-dimensional CSI.<n> Numerical experiments demonstrate that the proposed model can improve localization accuracy by over 30% and achieve a 20% capacity gain in non-line-of-sight (NLOS) scenarios.
arXiv Detail & Related papers (2025-11-21T07:25:49Z)
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation [50.433911327489554]
We introduce EarthMapper, a novel framework for controllable satellite-map translation.<n>We also contribute CNSatMap, a large-scale dataset comprising 302,132 precisely aligned satellite-map pairs across 38 Chinese cities.<n> experiments on CNSatMap and the New York dataset demonstrate EarthMapper's superior performance.
arXiv Detail & Related papers (2025-04-28T02:41:12Z)
Multimodal deep learning for mapping forest dominant height by fusing GEDI with earth observation data [5.309673841813994]
We propose a novel deep learning framework termed the multi-modal attention remote sensing network (MARSNet) to estimate forest dominant height. MARSNet comprises separate encoders for each remote sensing data modality to extract multi-scale features, and a shared decoder to fuse the features and estimate height. Our research demonstrates the effectiveness of a multimodal deep learning approach fusing GEDI with SAR and passive optical imagery for enhancing the accuracy of high resolution dominant height estimation.
arXiv Detail & Related papers (2023-11-20T14:02:50Z)
Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for Cross-City Semantic Segmentation using High-Resolution Domain Adaptation Networks [82.82866901799565]
We build a new set of multimodal remote sensing benchmark datasets (including hyperspectral, multispectral, SAR) for the study purpose of the cross-city semantic segmentation task. Beyond the single city, we propose a high-resolution domain adaptation network, HighDAN, to promote the AI model's generalization ability from the multi-city environments. HighDAN is capable of retaining the spatially topological structure of the studied urban scene well in a parallel high-to-low resolution fusion fashion.
arXiv Detail & Related papers (2023-09-26T23:55:39Z)
DETR Doesn't Need Multi-Scale or Locality Design [69.56292005230185]
This paper presents an improved DETR detector that maintains a "plain" nature. It uses a single-scale feature map and global cross-attention calculations without specific locality constraints. We show that two simple technologies are surprisingly effective within a plain design to compensate for the lack of multi-scale feature maps and locality constraints.
arXiv Detail & Related papers (2023-08-03T17:59:04Z)
On the Importance of Feature Representation for Flood Mapping using Classical Machine Learning Approaches [3.555368338253582]
Flood inundation mapping based on earth observation data can help in providing cheap and accurate maps depicting the area affected by a flood event to emergency-relief units in near-real-time. This paper evaluates the potential of five traditional machine learning approaches such as gradient boosted decision trees, support vector machines or quadratic discriminant analysis. With total and mean IoU values of 0.8751 and 0.7031 compared to 0.70 and 0.5873 as the previous best-reported results, we show that a simple gradient boosting classifier can significantly improve over deep neural network based approaches.
arXiv Detail & Related papers (2023-03-01T17:39:08Z)
Cross-view Geo-localization via Learning Disentangled Geometric Layout Correspondence [11.823147814005411]
Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database. Recent works achieve outstanding progress on cross-view geo-localization benchmarks. However, existing methods still suffer from poor performance on the cross-area benchmarks.
arXiv Detail & Related papers (2022-12-08T04:54:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.