Related papers: GreenHyperSpectra: A multi-source hyperspectral dataset for global vegetation trait prediction

GreenHyperSpectra: A multi-source hyperspectral dataset for global vegetation trait prediction

URL: http://arxiv.org/abs/2507.06806v1
Date: Wed, 09 Jul 2025 12:51:46 GMT
Title: GreenHyperSpectra: A multi-source hyperspectral dataset for global vegetation trait prediction
Authors: Eya Cherif, Arthur Ouaknine, Luke A. Brown, Phuong D. Dao, Kyle R. Kovach, Bing Lu, Daniel Mederer, Hannes Feilhauer, Teja Kattenborn, David Rolnick,
Abstract summary: We present GreenHyperSpectra, a pretraining dataset encompassing real-world cross-sensor and cross-ecosystem samples.<n>We successfully leverage GreenHyperSpectra to pretrain label-efficient multi-output regression models.<n>Our empirical analyses demonstrate substantial improvements in learning spectral representations for trait prediction.
Score: 15.87410077173391
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Plant traits such as leaf carbon content and leaf mass are essential variables in the study of biodiversity and climate change. However, conventional field sampling cannot feasibly cover trait variation at ecologically meaningful spatial scales. Machine learning represents a valuable solution for plant trait prediction across ecosystems, leveraging hyperspectral data from remote sensing. Nevertheless, trait prediction from hyperspectral data is challenged by label scarcity and substantial domain shifts (\eg across sensors, ecological distributions), requiring robust cross-domain methods. Here, we present GreenHyperSpectra, a pretraining dataset encompassing real-world cross-sensor and cross-ecosystem samples designed to benchmark trait prediction with semi- and self-supervised methods. We adopt an evaluation framework encompassing in-distribution and out-of-distribution scenarios. We successfully leverage GreenHyperSpectra to pretrain label-efficient multi-output regression models that outperform the state-of-the-art supervised baseline. Our empirical analyses demonstrate substantial improvements in learning spectral representations for trait prediction, establishing a comprehensive methodological framework to catalyze research at the intersection of representation learning and plant functional traits assessment. All code and data are available at: https://github.com/echerif18/HyspectraSSL.

Related papers

Temporal-Spectral-Spatial Unified Remote Sensing Dense Prediction [62.376936772702905]
Current deep learning architectures for remote sensing are fundamentally rigid.<n>We introduce the Spatial-Temporal-Spectral Unified Network (STSUN) for unified modeling.<n> STSUN can adapt to input and output data with arbitrary spatial sizes, temporal lengths, and spectral bands.<n>It unifies disparate dense prediction tasks within a single architecture by conditioning the model on trainable task embeddings.
arXiv Detail & Related papers (2025-05-18T07:39:17Z)
SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology [3.743127390843568]
Self-supervised learning has enabled learning representations from unlabeled data.<n>These models are often trained on datasets biased toward areas of high human activity.<n>To better capture vegetation seasonality at a global scale, we propose a simple phenology-informed sampling strategy.
arXiv Detail & Related papers (2025-04-25T10:58:44Z)
SatelliteCalculator: A Multi-Task Vision Foundation Model for Quantitative Remote Sensing Inversion [4.824120664293887]
We introduce SatelliteCalculator, the first vision foundation model for quantitative remote sensing inversion.<n>By leveraging physically defined index adapters, we automatically construct a large-scale dataset of over one million paired samples.<n> Experiments demonstrate that SatelliteCalculator achieves competitive accuracy across all tasks while significantly reducing inference cost.
arXiv Detail & Related papers (2025-04-18T03:48:04Z)
Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels. Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z)
LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models [25.047123247476016]
LITE is a large language model for environmental ecosystems modeling. It unifies different environmental variables by transforming them into natural language descriptions and line graph images. During this step, the incomplete features are imputed by a sparse Mixture-of-Experts framework.
arXiv Detail & Related papers (2024-04-01T15:14:07Z)
Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues. We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space. A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z)
SSL-SoilNet: A Hybrid Transformer-based Framework with Self-Supervised Learning for Large-scale Soil Organic Carbon Prediction [2.554658234030785]
This study introduces a novel approach that aims to learn the geographical link between multimodal features via self-supervised contrastive learning. The proposed approach has undergone rigorous testing on two distinct large-scale datasets.
arXiv Detail & Related papers (2023-08-07T13:44:44Z)
Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders. Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency. We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
Generative models-based data labeling for deep networks regression: application to seed maturity estimation from UAV multispectral images [3.6868861317674524]
Monitoring seed maturity is an increasing challenge in agriculture due to climate change and more restrictive practices. Traditional methods are based on limited sampling in the field and analysis in laboratory. We propose a method for estimating parsley seed maturity using multispectral UAV imagery, with a new approach for automatic data labeling.
arXiv Detail & Related papers (2022-08-09T09:06:51Z)
Uncertainty Inspired RGB-D Saliency Detection [70.50583438784571]
We propose the first framework to employ uncertainty for RGB-D saliency detection by learning from the data labeling process. Inspired by the saliency data labeling process, we propose a generative architecture to achieve probabilistic RGB-D saliency detection. Results on six challenging RGB-D benchmark datasets show our approach's superior performance in learning the distribution of saliency maps.
arXiv Detail & Related papers (2020-09-07T13:01:45Z)
Semi-Automatic Data Annotation guided by Feature Space Projection [117.9296191012968]
We present a semi-automatic data annotation approach based on suitable feature space projection and semi-supervised label estimation. We validate our method on the popular MNIST dataset and on images of human intestinal parasites with and without fecal impurities. Our results demonstrate the added-value of visual analytics tools that combine complementary abilities of humans and machines for more effective machine learning.
arXiv Detail & Related papers (2020-07-27T17:03:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.