Democratizing planetary-scale analysis: An ultra-lightweight Earth embedding database for accurate and flexible global land monitoring
- URL: http://arxiv.org/abs/2601.11183v1
- Date: Fri, 16 Jan 2026 10:59:43 GMT
- Title: Democratizing planetary-scale analysis: An ultra-lightweight Earth embedding database for accurate and flexible global land monitoring
- Authors: Shuang Chen, Jie Wang, Shuai Yuan, Jiayang Li, Yu Xia, Yuanhong Liao, Junbo Wei, Jincheng Yuan, Xiaoqing Xu, Xiaolin Zhu, Peng Zhu, Hongsheng Zhang, Yuyu Zhou, Haohuan Fu, Huabing Huang, Bin Chen, Fan Dai, Peng Gong,
- Abstract summary: ESD is an ultra-lightweight, 30-m global Earth embedding database spanning the 25-year period from 2000 to 2024.<n>The dataset achieves a transformative 340-fold reduction in data volume compared to raw archives.<n>With robust few-shot learning capabilities and longitudinal consistency, ESD provides a versatile foundation for democratizing planetary-scale research.
- Score: 19.019853798955513
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid evolution of satellite-borne Earth Observation (EO) systems has revolutionized terrestrial monitoring, yielding petabyte-scale archives. However, the immense computational and storage requirements for global-scale analysis often preclude widespread use, hindering planetary-scale studies. To address these barriers, we present Embedded Seamless Data (ESD), an ultra-lightweight, 30-m global Earth embedding database spanning the 25-year period from 2000 to 2024. By transforming high-dimensional, multi-sensor observations from the Landsat series (5, 7, 8, and 9) and MODIS Terra into information-dense, quantized latent vectors, ESD distills essential geophysical and semantic features into a unified latent space. Utilizing the ESDNet architecture and Finite Scalar Quantization (FSQ), the dataset achieves a transformative ~340-fold reduction in data volume compared to raw archives. This compression allows the entire global land surface for a single year to be encapsulated within approximately 2.4 TB, enabling decadal-scale global analysis on standard local workstations. Rigorous validation demonstrates high reconstructive fidelity (MAE: 0.0130; RMSE: 0.0179; CC: 0.8543). By condensing the annual phenological cycle into 12 temporal steps, the embeddings provide inherent denoising and a semantically organized space that outperforms raw reflectance in land-cover classification, achieving 79.74% accuracy (vs. 76.92% for raw fusion). With robust few-shot learning capabilities and longitudinal consistency, ESD provides a versatile foundation for democratizing planetary-scale research and advancing next-generation geospatial artificial intelligence.
Related papers
- Physically Interpretable AlphaEarth Foundation Model Embeddings Enable LLM-Based Land Surface Intelligence [0.0]
We present a comprehensive interpretability analysis of Google AlphaEarth's 64-dimensional embeddings against 26 environmental variables.<n>We then developed a Land Surface Intelligence system that implements retrieval-augmented generation over a FAISS-indexed embedding database of 12.1 million vectors.
arXiv Detail & Related papers (2026-02-10T22:58:50Z) - Breaking the Regional Barrier: Inductive Semantic Topology Learning for Worldwide Air Quality Forecasting [99.4484686548807]
We propose OmniAir, a semantic topology learning framework tailored for global station-level prediction.<n>Our approach effectively captures long-range non-Euclidean correlations and physical diffusion patterns across unevenly distributed global networks.<n>Experiments show that OmniAir achieves state-of-the-art performance against 18 baselines, maintaining high efficiency and scalability with speeds nearly 10 times faster than existing models.
arXiv Detail & Related papers (2026-01-29T15:58:07Z) - Advancing Ocean State Estimation with efficient and scalable AI [22.24444646069193]
We present an AI-driven Data Assimilation Framework for Ocean (ADAF-Ocean) that assimilates multi-source and multi-scale data.<n>ADAF-Ocean learns a continuous mapping from heterogeneous inputs to ocean states, preserving native data fidelity.
arXiv Detail & Related papers (2025-11-08T15:24:23Z) - ReconMOST: Multi-Layer Sea Temperature Reconstruction with Observations-Guided Diffusion [48.540756751934836]
ReconMOST is a data-driven guided diffusion model framework for multi-layer sea temperature reconstruction.<n>Our method extends ML-based SST reconstruction to a global, multi-layer setting, handling over 92.5% missing data.
arXiv Detail & Related papers (2025-06-12T06:27:22Z) - TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation [65.74990259650984]
We introduce TerraFM, a scalable self-supervised learning model that leverages globally distributed Sentinel-1 and Sentinel-2 imagery.<n>Our training strategy integrates local-global contrastive learning and introduces a dual-centering mechanism.<n>TerraFM achieves strong generalization on both classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench.
arXiv Detail & Related papers (2025-06-06T17:59:50Z) - High Performance Space Debris Tracking in Complex Skylight Backgrounds with a Large-Scale Dataset [48.32788509877459]
We propose a deep learning-based Space Debris Tracking Network(SDT-Net) to achieve highly accurate debris tracking.<n>SDT-Net effectively represents the feature of debris, enhancing the efficiency and stability of end-to-end model learning.<n>Our dataset and code will be released soon.
arXiv Detail & Related papers (2025-06-03T08:30:25Z) - From Proxies to Fields: Spatiotemporal Reconstruction of Global Radiation from Sparse Sensor Sequences [0.38836072943850625]
TRON is trained on 22 years of simulation data and generalizes across 65,341 spatial locations.<n>TRON offers a domain-agnostic framework for scientific field reconstruction from sparse data, with applications in atmospheric modeling, geophysical hazard monitoring, and real-time environmental risk forecasting.
arXiv Detail & Related papers (2025-05-24T16:24:10Z) - GAIA: A Foundation Model for Operational Atmospheric Dynamics [0.83442357861662]
We introduce GAIA, a hybrid self-supervised model that fuses Masked Autoencoders (MAE) with self-distillation with no labels (DINO)<n>GAIA learns disentangled representations that capture atmospheric dynamics rather than trivial diurnal patterns.<n>When transferred to downstream tasks, GAIA consistently outperforms an MAE-only baseline.
arXiv Detail & Related papers (2025-05-15T05:07:09Z) - Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation [67.23953699167274]
Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO)<n>In EO, this challenge is amplified by the redundancy and heavy-tailed distributions common in satellite imagery.<n>We propose a dynamic dataset pruning strategy designed to improve SSL pre-training by maximizing dataset diversity and balance.
arXiv Detail & Related papers (2025-04-09T15:13:26Z) - OpenEarthMap-SAR: A Benchmark Synthetic Aperture Radar Dataset for Global High-Resolution Land Cover Mapping [16.387666608029882]
We introduce OpenEarthMap-SAR, a benchmark SAR dataset for global high-resolution land cover mapping.<n>OpenEarthMap-SAR consists of 1.5 million segments of 5033 aerial and satellite images with the size of 1024$times$1024 pixels, covering 35 regions from Japan, France, and the USA.<n>We evaluate the performance of state-of-the-art methods for semantic segmentation and present challenging problem settings suitable for further technical development.
arXiv Detail & Related papers (2025-01-18T22:30:27Z) - SpectralEarth: Training Hyperspectral Foundation Models at Scale [47.93167977587301]
We introduce SpectralEarth, a large-scale multitemporal dataset designed to pretrain hyperspectral foundation models.<n>We pretrain a series of foundation models on SpectralEarth, integrating a spectral adapter into classical vision backbones.<n>In tandem, we construct nine downstream datasets for land-cover, crop-type mapping, and tree-species classification.
arXiv Detail & Related papers (2024-08-15T22:55:59Z) - Jalisco's multiclass land cover analysis and classification using a
novel lightweight convnet with real-world multispectral and relief data [51.715517570634994]
We present our novel lightweight (only 89k parameters) Convolution Neural Network (ConvNet) to make LC classification and analysis.
In this work, we combine three real-world open data sources to obtain 13 channels.
Our embedded analysis anticipates the limited performance in some classes and gives us the opportunity to group the most similar.
arXiv Detail & Related papers (2022-01-26T14:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.