Related papers: TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

URL: http://arxiv.org/abs/2506.20380v3
Date: Tue, 29 Jul 2025 15:23:52 GMT
Title: TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis
Authors: Zhengpeng Feng, Clement Atzberger, Sadiq Jaffer, Jovana Knezevic, Silja Sormunen, Robin Young, Madeline C Lisaius, Markus Immitzer, David A. Coomes, Anil Madhavapeddy, Andrew Blake, Srinivasan Keshav,
Abstract summary: We present TESSERA, an open, global, land-oriented remote sensing foundation model.<n>We use two parallel Transformer-based encoders to combine optical data from ten Sentinel-2 spectral bands at 10-60m spatial resolution and two Sentinel-1 synthetic aperture radar back coefficients at 10m resolution to create embeddings that are subsequently fused with a multilayer perceptron to create annual global embedding maps.
Score: 0.2479153065703935
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Satellite remote sensing from repeated observations and multiple sensors enables a wide range of downstream applications, including climate modeling, carbon accounting, and strategies for conservation and sustainable land use. However, satellite time series are voluminous, often corrupted by sensor noise, clouds, and atmospheric conditions, and unevenly spaced in time, making them challenging to use. We present TESSERA, an open, global, land-oriented remote sensing foundation model that uses self-supervised learning to generate `ready-to-use' embeddings at 10~m scale from pixel-level satellite time series data. TESSERA uses two parallel Transformer-based encoders to combine optical data from ten Sentinel-2 spectral bands at 10-60~m spatial resolution and two Sentinel-1 synthetic aperture radar backscatter coefficients at 10~m resolution to create embeddings that are subsequently fused with a multilayer perceptron to create annual global embedding maps. We compare our work with state-of-the-art task-specific models and other foundation models in five diverse downstream tasks and find that TESSERA closely matches or outperforms these baselines. We believe that TESSERA's ease of use, openness, computation-, label-, and data-efficiency, and high performance will prove transformative in a wide range of vegetation-oriented ecological and agricultural applications.

Related papers

TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation [65.74990259650984]
We introduce TerraFM, a scalable self-supervised learning model that leverages globally distributed Sentinel-1 and Sentinel-2 imagery.<n>Our training strategy integrates local-global contrastive learning and introduces a dual-centering mechanism.<n>TerraFM achieves strong generalization on both classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench.
arXiv Detail & Related papers (2025-06-06T17:59:50Z)
Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation [67.23953699167274]
Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO)<n>In EO, this challenge is amplified by the redundancy and heavy-tailed distributions common in satellite imagery.<n>We propose a dynamic dataset pruning strategy designed to improve SSL pre-training by maximizing dataset diversity and balance.
arXiv Detail & Related papers (2025-04-09T15:13:26Z)
A Deep Learning Architecture for Land Cover Mapping Using Spatio-Temporal Sentinel-1 Features [1.907072234794597]
The study focuses on three distinct regions - Amazonia, Africa, and Siberia - and evaluates the model performance across diverse ecoregions within these areas.<n>The results demonstrate the effectiveness and the capabilities of the proposed methodology in achieving overall accuracy (O.A.) values, even in regions with limited training data.
arXiv Detail & Related papers (2025-03-10T12:15:35Z)
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision [72.84868704100595]
This paper presents a dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks.<n>The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic.<n>Accompanying the dataset is EarthMAE, a tailored Masked Autoencoder developed to tackle the distinct challenges of remote sensing data.
arXiv Detail & Related papers (2025-01-14T13:42:22Z)
Deep Multimodal Fusion for Semantic Segmentation of Remote Sensing Earth Observation Data [0.08192907805418582]
This paper proposes a late fusion deep learning model (LF-DLM) for semantic segmentation. One branch integrates detailed textures from aerial imagery captured by UNetFormer with a Multi-Axis Vision Transformer (ViT) backbone. The other branch captures complex-temporal dynamics from the Sentinel-2 satellite imageMax time series using a U-ViNet with Temporal Attention (U-TAE)
arXiv Detail & Related papers (2024-10-01T07:50:37Z)
SpectralEarth: Training Hyperspectral Foundation Models at Scale [47.93167977587301]
We introduce SpectralEarth, a large-scale multi-temporal dataset designed to pretrain hyperspectral foundation models. We pretrain a series of foundation models on SpectralEarth using state-of-the-art self-supervised learning (SSL) algorithms. We construct four downstream datasets for land-cover and crop-type mapping, providing benchmarks for model evaluation.
arXiv Detail & Related papers (2024-08-15T22:55:59Z)
Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation [48.66623377464203]
Our novel approach introduces the Dynamic One-For-All (DOFA) model, leveraging the concept of neural plasticity in brain science. This dynamic hypernetwork, adjusting to different wavelengths, enables a single versatile Transformer jointly trained on data from five sensors to excel across 12 distinct Earth observation tasks.
arXiv Detail & Related papers (2024-03-22T17:11:47Z)
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection [79.23689506129733]
We establish a new benchmark dataset and an open-source method for large-scale SAR object detection. Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets. To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created.
arXiv Detail & Related papers (2024-03-11T09:20:40Z)
SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery [35.550999964460466]
We present SkySense, a generic billion-scale model, pre-trained on a curated multi-modal Remote Sensing dataset with 21.5 million temporal sequences. To our best knowledge, SkySense is the largest Multi-Modal to date, whose modules can be flexibly combined or used individually to accommodate various tasks.
arXiv Detail & Related papers (2023-12-15T09:57:21Z)
Estimating optical vegetation indices and biophysical variables for temperate forests with Sentinel-1 SAR data using machine learning techniques: A case study for Czechia [32.19783248549554]
Current optical vegetation indices (VIs) for monitoring forest ecosystems are well established and widely used in various applications. In contrast, synthetic aperture radar (SAR) data can offer insightful and systematic forest monitoring with complete time series (TS) due to signal penetration through clouds and day and night image acquisitions. This study aims to address the limitations of optical satellite data by using SAR data as an alternative for estimating optical VIs for forests through machine learning (ML) In general, up to 240 measurements per year and a spatial resolution of 20 m can be achieved using estimated SAR-based VIs with high accuracy.
arXiv Detail & Related papers (2023-11-13T18:23:46Z)
Fewshot learning on global multimodal embeddings for earth observation tasks [5.057850174013128]
We pretrain a CLIP/ViT based model using three different modalities of satellite imagery covering over 10% of Earth's total landmass. We use the embeddings produced for each modality with a classical machine learning method to attempt different downstream tasks for earth observation. We visually show that this embedding space, obtained with no labels, is sensible to the different earth features represented by the labelled datasets we selected.
arXiv Detail & Related papers (2023-09-29T20:15:52Z)
A generic self-supervised learning (SSL) framework for representation learning from spectra-spatial feature of unlabeled remote sensing imagery [4.397725469518669]
Self-supervised learning (SSL) enables the models to learn a representation from orders of magnitude more unlabelled data. This work has designed a novel SSL framework that is capable of learning representation from both spectra-spatial information of unlabelled data.
arXiv Detail & Related papers (2023-06-27T23:50:43Z)
Satellite Image Time Series Analysis for Big Earth Observation Data [50.591267188664666]
This paper describes sits, an open-source R package for satellite image time series analysis using machine learning. We show that this approach produces high accuracy for land use and land cover maps through a case study in the Cerrado biome.
arXiv Detail & Related papers (2022-04-24T15:23:25Z)
Deep Learning for Real Time Satellite Pose Estimation on Low Power Edge TPU [58.720142291102135]
In this paper we propose a pose estimation software exploiting neural network architectures. We show how low power machine learning accelerators could enable Artificial Intelligence exploitation in space.
arXiv Detail & Related papers (2022-04-07T08:53:18Z)
SpaceNet 6: Multi-Sensor All Weather Mapping Dataset [13.715388432549373]
We present an open Multi-Sensor All Weather Mapping (MSAW) dataset and challenge. MSAW covers 120 km2 over multiple overlapping collects and is annotated with over 48,000 unique building footprints labels. We present a baseline and benchmark for building footprint extraction with SAR data and find that state-of-the-art segmentation models pre-trained on optical data, and then trained on SAR.
arXiv Detail & Related papers (2020-04-14T13:43:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.