SSL-SoilNet: A Hybrid Transformer-based Framework with Self-Supervised Learning for Large-scale Soil Organic Carbon Prediction
- URL: http://arxiv.org/abs/2308.03586v3
- Date: Wed, 14 Aug 2024 08:29:55 GMT
- Title: SSL-SoilNet: A Hybrid Transformer-based Framework with Self-Supervised Learning for Large-scale Soil Organic Carbon Prediction
- Authors: Nafiseh Kakhani, Moien Rangzan, Ali Jamali, Sara Attarchi, Seyed Kazem Alavipanah, Michael Mommert, Nikolaos Tziolas, Thomas Scholten,
- Abstract summary: This study introduces a novel approach that aims to learn the geographical link between multimodal features via self-supervised contrastive learning.
The proposed approach has undergone rigorous testing on two distinct large-scale datasets.
- Score: 2.554658234030785
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Soil Organic Carbon (SOC) constitutes a fundamental component of terrestrial ecosystem functionality, playing a pivotal role in nutrient cycling, hydrological balance, and erosion mitigation. Precise mapping of SOC distribution is imperative for the quantification of ecosystem services, notably carbon sequestration and soil fertility enhancement. Digital soil mapping (DSM) leverages statistical models and advanced technologies, including machine learning (ML), to accurately map soil properties, such as SOC, utilizing diverse data sources like satellite imagery, topography, remote sensing indices, and climate series. Within the domain of ML, self-supervised learning (SSL), which exploits unlabeled data, has gained prominence in recent years. This study introduces a novel approach that aims to learn the geographical link between multimodal features via self-supervised contrastive learning, employing pretrained Vision Transformers (ViT) for image inputs and Transformers for climate data, before fine-tuning the model with ground reference samples. The proposed approach has undergone rigorous testing on two distinct large-scale datasets, with results indicating its superiority over traditional supervised learning models, which depends solely on labeled data. Furthermore, through the utilization of various evaluation metrics (e.g., RMSE, MAE, CCC, etc.), the proposed model exhibits higher accuracy when compared to other conventional ML algorithms like random forest and gradient boosting. This model is a robust tool for predicting SOC and contributes to the advancement of DSM techniques, thereby facilitating land management and decision-making processes based on accurate information.
Related papers
- DOGMA: Weaving Structural Information into Data-centric Single-cell Transcriptomics Analysis [43.565183518761984]
We propose DOGMA, a data-centric framework designed for the structural reshaping and semantic enhancement of raw data.<n>In complex multi-species and multi-organ benchmarks, DOGMA SOTA performance, exhibiting superior zero-shot robustness and sample efficiency.
arXiv Detail & Related papers (2026-02-02T09:10:09Z) - AgroFlux: A Spatial-Temporal Benchmark for Carbon and Nitrogen Flux Prediction in Agricultural Ecosystems [32.91715282741263]
We introduce a first-of-its-kind spatial-temporal agroecosystem GHG benchmark dataset.<n>We evaluate the performance of various sequential deep learning models on carbon and nitrogen flux prediction.<n>Our benchmark dataset and evaluation framework contribute to the development of more accurate and scalable AI-driven agroecosystem models.
arXiv Detail & Related papers (2026-02-02T04:04:07Z) - Towards Fine-Tuning-Based Site Calibration for Knowledge-Guided Machine Learning: A Summary of Results [8.556682505387199]
FTBSC-KGML is a pretraining- and fine-tuning-based, spatial-variability-aware, and knowledge-guided machine learning framework.<n>It estimates land emissions while leveraging transfer learning and spatial heterogeneity.<n>It achieves lower validation error and greater consistency in explanatory power than a purely global model.
arXiv Detail & Related papers (2025-12-17T22:40:54Z) - Predicting California Bearing Ratio with Ensemble and Neural Network Models: A Case Study from Turkiye [0.0]
The California Bearing Ratio (CBR) is a key geotechnical indicator used to assess the load-bearing capacity of subgrade soils.<n>Traditional tests are often time-consuming, costly, and can be impractical, particularly for large-scale or diverse soil profiles.<n>Recent progress in artificial intelligence, especially machine learning (ML), has enabled data-driven approaches for modeling complex soil behavior with greater speed and precision.<n>This study introduces a comprehensive ML framework for CBR prediction using a dataset of 382 soil samples collected from various geoclimatic regions in Trkiye.
arXiv Detail & Related papers (2025-12-09T08:09:55Z) - Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping? [0.0937465283958018]
We introduce a benchmark that evaluates state-of-the-art artificial neural networks (ANNs) for predictive soil modeling.<n>Our evaluation encompasses 31 field- and farm-scale datasets containing 30 to 460 samples and three critical soil properties.<n>We recommend the adoption of modern ANNs for field-scale PSM and propose TabPFN as the new default choice in the toolkit of every pedometrician.
arXiv Detail & Related papers (2025-08-13T15:46:12Z) - GreenHyperSpectra: A multi-source hyperspectral dataset for global vegetation trait prediction [15.87410077173391]
We present GreenHyperSpectra, a pretraining dataset encompassing real-world cross-sensor and cross-ecosystem samples.<n>We successfully leverage GreenHyperSpectra to pretrain label-efficient multi-output regression models.<n>Our empirical analyses demonstrate substantial improvements in learning spectral representations for trait prediction.
arXiv Detail & Related papers (2025-07-09T12:51:46Z) - Private Training & Data Generation by Clustering Embeddings [74.00687214400021]
Differential privacy (DP) provides a robust framework for protecting individual data.<n>We introduce a novel principled method for DP synthetic image embedding generation.<n> Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy.
arXiv Detail & Related papers (2025-06-20T00:17:14Z) - Revisiting Automatic Data Curation for Vision Foundation Models in Digital Pathology [41.34847597178388]
Vision foundation models (FMs) learn to represent histological features in highly heterogeneous tiles extracted from whole-slide images.<n>We investigate the potential of unsupervised automatic data curation at the tile-level, taking into account 350 million tiles.
arXiv Detail & Related papers (2025-03-24T14:23:48Z) - FTA-FTL: A Fine-Tuned Aggregation Federated Transfer Learning Scheme for Lithology Microscopic Image Classification [4.245694283697248]
This study involves two phases; the first is to conduct Lithology microscopic image classification on a small dataset using transfer learning.
In the second phase, we formulated the classification task to a Federated Transfer Learning scheme and proposed a Fine-Tuned Aggregation strategy for Federated Learning (FTA-FTL)
The results are in excellent agreement and confirm the efficiency of the proposed scheme, and show that the proposed FTA-FTL algorithm is capable enough to achieve approximately the same results obtained by the centralized implementation for Lithology microscopic images classification task.
arXiv Detail & Related papers (2025-01-06T19:32:14Z) - Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [56.00251589760559]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
Experiments indicate that LM-GC surpasses existing state-of-the-art lossless compression methods.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - Machine Learning for Methane Detection and Quantification from Space -- A survey [49.7996292123687]
Methane (CH_4) is a potent anthropogenic greenhouse gas, contributing 86 times more to global warming than Carbon Dioxide (CO_2) over 20 years.
This work expands existing information on operational methane point source detection sensors in the Short-Wave Infrared (SWIR) bands.
It reviews the state-of-the-art for traditional as well as Machine Learning (ML) approaches.
arXiv Detail & Related papers (2024-08-27T15:03:20Z) - Enhancing Ecological Monitoring with Multi-Objective Optimization: A Novel Dataset and Methodology for Segmentation Algorithms [17.802456388479616]
We introduce a unique semantic segmentation dataset of 6,096 high-resolution aerial images capturing indigenous and invasive grass species in Bega Valley, New South Wales, Australia.
This dataset presents a challenging task due to the overlap and distribution of grass species.
The dataset and code will be made publicly available, aiming to drive research in computer vision, machine learning, and ecological studies.
arXiv Detail & Related papers (2024-07-25T18:27:27Z) - Quanv4EO: Empowering Earth Observation by means of Quanvolutional Neural Networks [62.12107686529827]
This article highlights a significant shift towards leveraging quantum computing techniques in processing large volumes of remote sensing data.
The proposed Quanv4EO model introduces a quanvolution method for preprocessing multi-dimensional EO data.
Key findings suggest that the proposed model not only maintains high precision in image classification but also shows improvements of around 5% in EO use cases.
arXiv Detail & Related papers (2024-07-24T09:11:34Z) - Physics Informed Machine Learning (PIML) methods for estimating the remaining useful lifetime (RUL) of aircraft engines [0.0]
This paper is aimed at using the newly developing field of physics informed machine learning (PIML) to develop models for predicting the remaining useful lifetime (RUL) aircraft engines.
We consider the well-known benchmark NASA Commercial Modular Aero-Propulsion System Simulation System (C-MAPSS) data as the main data for this paper.
C-MAPSS is a well-studied dataset with much existing work in the literature that address RUL prediction with classical and deep learning methods.
arXiv Detail & Related papers (2024-06-21T19:55:34Z) - GFM4MPM: Towards Geospatial Foundation Models for Mineral Prospectivity Mapping [2.7998963147546148]
We propose a self-supervised approach to train a backbone neural network in a self-supervised manner using unlabeled geospatial data alone.
Our results demonstrate that self-supervision promotes robustness in learned features, improving prospectivity predictions.
We leverage explainable artificial intelligence techniques to demonstrate that individual predictions can be interpreted from a geological perspective.
arXiv Detail & Related papers (2024-06-18T16:24:28Z) - MT-HCCAR: Multi-Task Deep Learning with Hierarchical Classification and Attention-based Regression for Cloud Property Retrieval [4.24122904716917]
This paper introduces MT-HCCAR, an end-to-end deep learning model employing multi-task learning to tackle cloud masking, cloud phase retrieval, and COT prediction.
The MT-HCCAR integrates a hierarchical classification network (HC) and a classification-assisted attention-based regression network (CAR) to enhance precision and robustness in cloud labeling and COT prediction.
arXiv Detail & Related papers (2024-01-29T19:50:50Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep
Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data.
We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z) - Estimating Crop Primary Productivity with Sentinel-2 and Landsat 8 using
Machine Learning Methods Trained with Radiative Transfer Simulations [58.17039841385472]
We take advantage of all parallel developments in mechanistic modeling and satellite data availability for advanced monitoring of crop productivity.
Our model successfully estimates gross primary productivity across a variety of C3 crop types and environmental conditions even though it does not use any local information from the corresponding sites.
This highlights its potential to map crop productivity from new satellite sensors at a global scale with the help of current Earth observation cloud computing platforms.
arXiv Detail & Related papers (2020-12-07T16:23:13Z) - From calibration to parameter learning: Harnessing the scaling effects
of big data in geoscientific modeling [2.9897531698031403]
We propose a differentiable parameter learning framework that efficiently learns a global mapping between inputs and parameters.
As training data increases, dPL achieves better performance, more physical coherence, and better generalizability.
We demonstrate examples that learned from soil moisture and streamflow, where dPL drastically outperformed existing evolutionary and regionalization methods.
arXiv Detail & Related papers (2020-07-30T21:38:56Z) - Transfer Learning without Knowing: Reprogramming Black-box Machine
Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model.
Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses.
BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.