TESSERA: Precomputed FAIR Global Pixel Embeddings for Earth Representation and Analysis
- URL: http://arxiv.org/abs/2506.20380v5
- Date: Fri, 19 Sep 2025 15:49:04 GMT
- Title: TESSERA: Precomputed FAIR Global Pixel Embeddings for Earth Representation and Analysis
- Authors: Zhengpeng Feng, Clement Atzberger, Sadiq Jaffer, Jovana Knezevic, Silja Sormunen, Robin Young, Madeline C Lisaius, Markus Immitzer, Toby Jackson, James Ball, David A. Coomes, Anil Madhavapeddy, Andrew Blake, Srinivasan Keshav,
- Abstract summary: We present TESSERA, a pixel-oriented foundation model for EO time series.<n>It creates 128-dimensional latent embeddings requiring only a few labels for task-specific training.<n>It is unprecedented in ease of use, scale, and accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Petabytes of satellite Earth Observation (EO) data are freely available and can address critical global challenges. However, EO data quality is poor due to clouds and variable lighting conditions. To address this, practitioners typically use compositing, but this critically removes the temporal phenological signal. Moreover, supervised machine learning to map composited pixels to task-specific classes requires accurately labelled data that are rarely available. We present TESSERA, a pixel-oriented foundation model for EO time series that creates 128-dimensional latent embeddings requiring only a few labels for task-specific training to achieve state-of-the-art performance across diverse complex tasks. TESSERA uses two encoders that combine optical data with synthetic aperture radar backscatter coefficients at 10m resolution, creating embeddings fused with a multilayer perceptron to generate annual global embedding maps. TESSERA closely matches or outperforms state-of-the-art task-specific models and other foundation models across five diverse downstream tasks. It is unprecedented in ease of use, scale, and accuracy: no other open foundation model provides precomputed outputs with global, annual coverage at 10m resolution.
Related papers
- From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios [66.57089888022414]
We introduce DenseWorld, a benchmark spanning a broad set of 25 dense prediction tasks that correspond to urgent real-world applications.<n>We then propose DenseDiT, which exploits generative models' visual priors to perform diverse real-world dense prediction tasks through a unified strategy.<n>DenseDiT achieves superior results using less than 0.01% training data of baselines, underscoring its practical value for real-world deployment.
arXiv Detail & Related papers (2025-06-25T09:40:50Z) - TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation [65.74990259650984]
We introduce TerraFM, a scalable self-supervised learning model that leverages globally distributed Sentinel-1 and Sentinel-2 imagery.<n>Our training strategy integrates local-global contrastive learning and introduces a dual-centering mechanism.<n>TerraFM achieves strong generalization on both classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench.
arXiv Detail & Related papers (2025-06-06T17:59:50Z) - Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation [67.23953699167274]
Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO)<n>In EO, this challenge is amplified by the redundancy and heavy-tailed distributions common in satellite imagery.<n>We propose a dynamic dataset pruning strategy designed to improve SSL pre-training by maximizing dataset diversity and balance.
arXiv Detail & Related papers (2025-04-09T15:13:26Z) - A Deep Learning Architecture for Land Cover Mapping Using Spatio-Temporal Sentinel-1 Features [1.907072234794597]
The study focuses on three distinct regions - Amazonia, Africa, and Siberia - and evaluates the model performance across diverse ecoregions within these areas.<n>The results demonstrate the effectiveness and the capabilities of the proposed methodology in achieving overall accuracy (O.A.) values, even in regions with limited training data.
arXiv Detail & Related papers (2025-03-10T12:15:35Z) - OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing [57.050679160659705]
We introduce textbfOpenEarthSensing (OES), a large-scale fine-grained benchmark for open-world remote sensing.<n>OES includes 189 scene and object categories, covering the vast majority of potential semantic shifts that may occur in the real world.
arXiv Detail & Related papers (2025-02-28T02:49:52Z) - EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision [72.84868704100595]
This paper presents a dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks.<n>The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic.<n>Accompanying the dataset is EarthMAE, a tailored Masked Autoencoder developed to tackle the distinct challenges of remote sensing data.
arXiv Detail & Related papers (2025-01-14T13:42:22Z) - Deep Multimodal Fusion for Semantic Segmentation of Remote Sensing Earth Observation Data [0.08192907805418582]
This paper proposes a late fusion deep learning model (LF-DLM) for semantic segmentation.
One branch integrates detailed textures from aerial imagery captured by UNetFormer with a Multi-Axis Vision Transformer (ViT) backbone.
The other branch captures complex-temporal dynamics from the Sentinel-2 satellite imageMax time series using a U-ViNet with Temporal Attention (U-TAE)
arXiv Detail & Related papers (2024-10-01T07:50:37Z) - SpectralEarth: Training Hyperspectral Foundation Models at Scale [47.93167977587301]
We introduce SpectralEarth, a large-scale multi-temporal dataset designed to pretrain hyperspectral foundation models.
We pretrain a series of foundation models on SpectralEarth using state-of-the-art self-supervised learning (SSL) algorithms.
We construct four downstream datasets for land-cover and crop-type mapping, providing benchmarks for model evaluation.
arXiv Detail & Related papers (2024-08-15T22:55:59Z) - Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation [48.66623377464203]
Our novel approach introduces the Dynamic One-For-All (DOFA) model, leveraging the concept of neural plasticity in brain science.
This dynamic hypernetwork, adjusting to different wavelengths, enables a single versatile Transformer jointly trained on data from five sensors to excel across 12 distinct Earth observation tasks.
arXiv Detail & Related papers (2024-03-22T17:11:47Z) - SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection [79.23689506129733]
We establish a new benchmark dataset and an open-source method for large-scale SAR object detection.
Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets.
To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created.
arXiv Detail & Related papers (2024-03-11T09:20:40Z) - SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery [35.550999964460466]
We present SkySense, a generic billion-scale model, pre-trained on a curated multi-modal Remote Sensing dataset with 21.5 million temporal sequences.
To our best knowledge, SkySense is the largest Multi-Modal to date, whose modules can be flexibly combined or used individually to accommodate various tasks.
arXiv Detail & Related papers (2023-12-15T09:57:21Z) - Estimating optical vegetation indices and biophysical variables for temperate forests with Sentinel-1 SAR data using machine learning techniques: A case study for Czechia [32.19783248549554]
Current optical vegetation indices (VIs) for monitoring forest ecosystems are well established and widely used in various applications.
In contrast, synthetic aperture radar (SAR) data can offer insightful and systematic forest monitoring with complete time series (TS) due to signal penetration through clouds and day and night image acquisitions.
This study aims to address the limitations of optical satellite data by using SAR data as an alternative for estimating optical VIs for forests through machine learning (ML)
In general, up to 240 measurements per year and a spatial resolution of 20 m can be achieved using estimated SAR-based VIs with high accuracy.
arXiv Detail & Related papers (2023-11-13T18:23:46Z) - Fewshot learning on global multimodal embeddings for earth observation
tasks [5.057850174013128]
We pretrain a CLIP/ViT based model using three different modalities of satellite imagery covering over 10% of Earth's total landmass.
We use the embeddings produced for each modality with a classical machine learning method to attempt different downstream tasks for earth observation.
We visually show that this embedding space, obtained with no labels, is sensible to the different earth features represented by the labelled datasets we selected.
arXiv Detail & Related papers (2023-09-29T20:15:52Z) - A generic self-supervised learning (SSL) framework for representation
learning from spectra-spatial feature of unlabeled remote sensing imagery [4.397725469518669]
Self-supervised learning (SSL) enables the models to learn a representation from orders of magnitude more unlabelled data.
This work has designed a novel SSL framework that is capable of learning representation from both spectra-spatial information of unlabelled data.
arXiv Detail & Related papers (2023-06-27T23:50:43Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - Neural-Sim: Learning to Generate Training Data with NeRF [31.81496344354997]
We present the first fully differentiable synthetic data pipeline that uses Neural Radiance Fields (NeRFs) in a closed-loop with a target application's loss function.
Our approach generates data on-demand, with no human labor, to maximize accuracy for a target task.
arXiv Detail & Related papers (2022-07-22T22:48:33Z) - Satellite Image Time Series Analysis for Big Earth Observation Data [50.591267188664666]
This paper describes sits, an open-source R package for satellite image time series analysis using machine learning.
We show that this approach produces high accuracy for land use and land cover maps through a case study in the Cerrado biome.
arXiv Detail & Related papers (2022-04-24T15:23:25Z) - Deep Learning for Real Time Satellite Pose Estimation on Low Power Edge
TPU [58.720142291102135]
In this paper we propose a pose estimation software exploiting neural network architectures.
We show how low power machine learning accelerators could enable Artificial Intelligence exploitation in space.
arXiv Detail & Related papers (2022-04-07T08:53:18Z) - Conditional Generation of Synthetic Geospatial Images from Pixel-level
and Feature-level Inputs [0.0]
We present a conditional generative model, called VAE-Info-cGAN, for synthesizing semantically rich images simultaneously conditioned on a pixel-level condition (PLC) and a feature-level condition (FLC)
The proposed model can accurately generate various forms of macroscopic aggregates across different geographic locations while conditioned only on atemporal representation of the road network.
arXiv Detail & Related papers (2021-09-11T06:58:19Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - SpaceNet 6: Multi-Sensor All Weather Mapping Dataset [13.715388432549373]
We present an open Multi-Sensor All Weather Mapping (MSAW) dataset and challenge.
MSAW covers 120 km2 over multiple overlapping collects and is annotated with over 48,000 unique building footprints labels.
We present a baseline and benchmark for building footprint extraction with SAR data and find that state-of-the-art segmentation models pre-trained on optical data, and then trained on SAR.
arXiv Detail & Related papers (2020-04-14T13:43:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.