Climplicit: Climatic Implicit Embeddings for Global Ecological Tasks
- URL: http://arxiv.org/abs/2504.05089v2
- Date: Mon, 14 Apr 2025 12:47:23 GMT
- Title: Climplicit: Climatic Implicit Embeddings for Global Ecological Tasks
- Authors: Johannes Dollinger, Damien Robert, Elena Plekhanova, Lukas Drees, Jan Dirk Wegner,
- Abstract summary: We introduce a trait encoder, a geolocation pretrained to generate implicit climatic representations anywhere on Earth.<n>We find that single-layer probing our Climplicit embeddings consistently performs better or on par with training a model from scratch on downstream tasks.
- Score: 4.165168823227979
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning on climatic data holds potential for macroecological applications. However, its adoption remains limited among scientists outside the deep learning community due to storage, compute, and technical expertise barriers. To address this, we introduce Climplicit, a spatio-temporal geolocation encoder pretrained to generate implicit climatic representations anywhere on Earth. By bypassing the need to download raw climatic rasters and train feature extractors, our model uses x3500 less disk space and significantly reduces computational needs for downstream tasks. We evaluate our Climplicit embeddings on biomes classification, species distribution modeling, and plant trait regression. We find that single-layer probing our Climplicit embeddings consistently performs better or on par with training a model from scratch on downstream tasks and overall better than alternative geolocation encoding models.
Related papers
- SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology [3.743127390843568]
Self-supervised learning has enabled learning representations from unlabeled data.
These models are often trained on datasets biased toward areas of high human activity.
To better capture vegetation seasonality at a global scale, we propose a simple phenology-informed sampling strategy.
arXiv Detail & Related papers (2025-04-25T10:58:44Z) - Efficient Mixture of Geographical Species for On Device Wildlife Monitoring [2.8718221966298754]
In this work, we explore the training of a single species detector which uses conditional to bias structured sub networks in a geographically-aware manner.
We propose a method for pruning the expert model per location and demonstrate conditional computation performance on two geographically distributed datasets: iNaturalist and iWildcam.
arXiv Detail & Related papers (2025-04-11T15:25:36Z) - Fourier Neural Operator based surrogates for $CO_2$ storage in realistic geologies [57.23978190717341]
We develop a Neural Operator (FNO) based model for real-time, high-resolution simulation of $CO$ plume migration.<n>The model is trained on a comprehensive dataset generated from realistic subsurface parameters.<n>We present various strategies for improving the reliability of predictions from the model, which is crucial while assessing actual geological sites.
arXiv Detail & Related papers (2025-03-14T02:58:24Z) - MiTREE: Multi-input Transformer Ecoregion Encoder for Species Distribution Modelling [2.3776390335270694]
We introduce MiTREE, a multi-input Vision-Transformer-based model with an ecoregion encoder.<n>We evaluate our model on the SatBird Summer and Winter datasets, the goal of which is to predict bird species encounter rates.
arXiv Detail & Related papers (2024-12-25T22:20:47Z) - MambaDS: Near-Surface Meteorological Field Downscaling with Topography Constrained Selective State Space Modeling [68.69647625472464]
Downscaling, a crucial task in meteorological forecasting, enables the reconstruction of high-resolution meteorological states for target regions.
Previous downscaling methods lacked tailored designs for meteorology and encountered structural limitations.
We propose a novel model called MambaDS, which enhances the utilization of multivariable correlations and topography information.
arXiv Detail & Related papers (2024-08-20T13:45:49Z) - SatBird: Bird Species Distribution Modeling with Remote Sensing and
Citizen Science Data [68.2366021016172]
We present SatBird, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird.
We also provide a dataset in Kenya representing low-data regimes.
We benchmark a set of baselines on our dataset, including SOTA models for remote sensing tasks.
arXiv Detail & Related papers (2023-11-02T02:00:27Z) - Efficient machine-learning surrogates for large-scale geological carbon
and energy storage [0.276240219662896]
We propose a specialized machine-learning (ML) model to manage extensive reservoir models efficiently.
We've developed a method to reduce the training cost for deep neural operator models, using domain decomposition and a topology embedder.
This approach allows accurate predictions within the model's domain, even for untrained data, enhancing ML efficiency for large-scale geological storage applications.
arXiv Detail & Related papers (2023-10-11T13:05:03Z) - Climate Intervention Analysis using AI Model Guided by Statistical
Physics Principles [6.824166358727082]
We propose a novel solution by utilizing a principle from statistical physics known as the Fluctuation-Dissipation Theorem (FDT)
By leveraging, we are able to extract information encoded in a large dataset produced by Earth System Models.
Our model, AiBEDO, is capable of capturing the complex, multi-timescale effects of radiation perturbations on global and regional surface climate.
arXiv Detail & Related papers (2023-02-07T05:09:10Z) - ClimaX: A foundation model for weather and climate [51.208269971019504]
ClimaX is a deep learning model for weather and climate science.
It can be pre-trained with a self-supervised learning objective on climate datasets.
It can be fine-tuned to address a breadth of climate and weather tasks.
arXiv Detail & Related papers (2023-01-24T23:19:01Z) - Multi-scale Digital Twin: Developing a fast and physics-informed
surrogate model for groundwater contamination with uncertain climate models [53.44486283038738]
Climate change exacerbates the long-term soil management problem of groundwater contamination.
We develop a physics-informed machine learning surrogate model using U-Net enhanced Fourier Neural Contaminated (PDENO)
In parallel, we develop a convolutional autoencoder combined with climate data to reduce the dimensionality of climatic region similarities across the United States.
arXiv Detail & Related papers (2022-11-20T06:46:35Z) - Spatiotemporal modeling of European paleoclimate using doubly sparse
Gaussian processes [61.31361524229248]
We build on recent scale sparsetemporal GPs to reduce the computational burden.
We successfully employ such a doubly sparse GP to construct a probabilistic model of paleoclimate.
arXiv Detail & Related papers (2022-11-15T14:15:04Z) - Climate-Invariant Machine Learning [0.8831201550856289]
Current climate models require representations of processes that occur at scales smaller than model grid size.
Recent machine learning (ML) algorithms hold promise to improve such process representations, but tend to extrapolate poorly to climate regimes they were not trained on.
We propose a new framework - termed "climate-invariant" ML - incorporating knowledge of climate processes into ML algorithms.
arXiv Detail & Related papers (2021-12-14T07:02:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.