Self-supervised and Multi-fidelity Learning for Extended Predictive Soil Spectroscopy
- URL: http://arxiv.org/abs/2511.15965v1
- Date: Thu, 20 Nov 2025 01:36:33 GMT
- Title: Self-supervised and Multi-fidelity Learning for Extended Predictive Soil Spectroscopy
- Authors: Luning Sun, José L. Safanelli, Jonathan Sanderman, Katerina Georgiou, Colby Brungard, Kanchan Grover, Bryan G. Hopkins, Shusen Liu, Timo Bremer,
- Abstract summary: We propose a framework for multi-fidelity learning and extended predictive soil spectroscopy based on latent space embeddings.<n>A self-supervised representation was pretrained with the large MIR spectral library and the Vari Autoencoder algorithm.<n> Predictions from the spectrum conversion (NIR to MIR) task did not match the performance of the original MIR spectra but were similar or superior to predictive performance of NIR-only models.
- Score: 2.8830677829565894
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a self-supervised machine learning (SSML) framework for multi-fidelity learning and extended predictive soil spectroscopy based on latent space embeddings. A self-supervised representation was pretrained with the large MIR spectral library and the Variational Autoencoder algorithm to obtain a compressed latent space for generating spectral embeddings. At this stage, only unlabeled spectral data were used, allowing us to leverage the full spectral database and the availability of scan repeats for augmented training. We also leveraged and froze the trained MIR decoder for a spectrum conversion task by plugging it into a NIR encoder to learn the mapping between NIR and MIR spectra in an attempt to leverage the predictive capabilities contained in the large MIR library with a low cost portable NIR scanner. This was achieved by using a smaller subset of the KSSL library with paired NIR and MIR spectra. Downstream machine learning models were then trained to map between original spectra, predicted spectra, and latent space embeddings for nine soil properties. The performance of was evaluated independently of the KSSL training data using a gold-standard test set, along with regression goodness-of-fit metrics. Compared to baseline models, the proposed SSML and its embeddings yielded similar or better accuracy in all soil properties prediction tasks. Predictions derived from the spectrum conversion (NIR to MIR) task did not match the performance of the original MIR spectra but were similar or superior to predictive performance of NIR-only models, suggesting the unified spectral latent space can effectively leverage the larger and more diverse MIR dataset for prediction of soil properties not well represented in current NIR libraries.
Related papers
- A simulation-based training framework for machine-learning applications in ARPES [0.0]
We introduce an open-source synthetic ARPES spectra simulator - aurelia - for generating the large datasets necessary to train machine learning models.<n>We benchmark the simulation-trained model against actual experimental data and find that it can assess the spectra quality more accurately than human analysis.
arXiv Detail & Related papers (2025-08-21T21:59:09Z) - SpectrumFM: Redefining Spectrum Cognition via Foundation Modeling [65.65474629224558]
We propose a spectrum foundation model, termed SpectrumFM, which provides a new paradigm for spectrum cognition.<n>An innovative spectrum encoder that exploits the convolutional neural networks is proposed to effectively capture both fine-grained local signal structures and high-level global dependencies in the spectrum data.<n>Two novel self-supervised learning tasks, namely masked reconstruction and next-slot signal prediction, are developed for pre-training SpectrumFM, enabling the model to learn rich and transferable representations.
arXiv Detail & Related papers (2025-08-02T14:40:50Z) - LUMIR: an LLM-Driven Unified Agent Framework for Multi-task Infrared Spectroscopy Reasoning [12.138903544219724]
This study introduces LUMIR, a framework designed to achieve accurate infrared spectral analysis under low data conditions.<n> LUMIR integrates a structured literature knowledge base, automated preprocessing, feature extraction, and predictive modeling into a unified pipeline.<n>It was validated on diverse datasets, including the publicly available Milk near-infrared dataset, Chinese medicinal herbs, Citri Reticulatae Pericarpium(CRP) with different storage durations, an industrial wastewater COD dataset, Tecator and Corn.
arXiv Detail & Related papers (2025-07-29T03:20:51Z) - SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars [6.314253302704276]
We present SpecCLIP, a foundation model framework that extends LLM-inspired methodologies to stellar spectral analysis.<n>By training foundation models on large-scale spectral datasets, our goal is to learn robust and informative embeddings that support diverse downstream applications.<n>We demonstrate that fine-tuning these models on moderate-sized labeled datasets improves adaptability to tasks such as stellar- parameter estimation and chemical-abundance determination.
arXiv Detail & Related papers (2025-07-02T17:49:52Z) - CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis [69.02751635551724]
Spectral imaging offers promising applications across diverse domains, including medicine and urban scene understanding.<n> variability in channel dimensionality and captured wavelengths among spectral cameras impede the development of AI-driven methodologies.<n>We introduce CARL, a model for Camera-Agnostic Representation Learning across RGB, multispectral, and hyperspectral imaging modalities.
arXiv Detail & Related papers (2025-04-27T13:06:40Z) - PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model [83.35198885088093]
PolSAR data presents unique challenges due to its rich and complex characteristics.<n>Existing data representations, such as complex-valued data, polarimetric features, and amplitude images, are widely used.<n>Most feature extraction networks for PolSAR are small, limiting their ability to capture features effectively.<n>We propose the Polarimetric Scattering Mechanism-Informed SAM (PolSAM), an enhanced Segment Anything Model (SAM) that integrates domain-specific scattering characteristics and a novel prompt generation strategy.
arXiv Detail & Related papers (2024-12-17T09:59:53Z) - Holistic Physics Solver: Learning PDEs in a Unified Spectral-Physical Space [54.13671100638092]
Holistic Physics Mixer (HPM) is a framework for integrating spectral and physical information in a unified space.<n>We show that HPM consistently outperforms state-of-the-art methods in both accuracy and computational efficiency.
arXiv Detail & Related papers (2024-10-15T08:19:39Z) - Hodge-Aware Contrastive Learning [101.56637264703058]
Simplicial complexes prove effective in modeling data with multiway dependencies.
We develop a contrastive self-supervised learning approach for processing simplicial data.
arXiv Detail & Related papers (2023-09-14T00:40:07Z) - ESSAformer: Efficient Transformer for Hyperspectral Image
Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.
We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z) - Rapid detection of soil carbonates by means of NIR spectroscopy, deep
learning methods and phase quantification by powder Xray diffraction [0.0]
We propose a rapid and efficient way to predict carbonates content in soil by means of FT NIR spectroscopy and by use of deep learning methods.
We exploited multiple machine learning methods, such as: 1) a Regressor and 2) a CNN and compare their performance with other traditional ML algorithms.
arXiv Detail & Related papers (2023-07-23T14:32:07Z) - Closing the loop: Autonomous experiments enabled by
machine-learning-based online data analysis in synchrotron beamline
environments [80.49514665620008]
Machine learning can be used to enhance research involving large or rapidly generated datasets.
In this study, we describe the incorporation of ML into a closed-loop workflow for X-ray reflectometry (XRR)
We present solutions that provide an elementary data analysis in real time during the experiment without introducing the additional software dependencies in the beamline control software environment.
arXiv Detail & Related papers (2023-06-20T21:21:19Z) - Electron energy loss spectroscopy database synthesis and automation of
core-loss edge recognition by deep-learning neural networks [0.0]
A convolutional-bidirectional long short-term memory neural network (CNN-BiLSTM) is proposed to automate the detection and elemental identification of core-loss edges from raw spectra.
The high accuracy of the network, 94.9 %, proves that, without complicated preprocessing of the raw spectra, the proposed CNN-BiLSTM network achieves the automation of core-loss edge recognition for EELS spectra with high accuracy.
arXiv Detail & Related papers (2022-09-26T20:57:34Z) - MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral
Reconstruction [148.26195175240923]
We propose a novel Transformer-based method, Multi-stage Spectral-wise Transformer (MST++) for efficient spectral reconstruction.
In the NTIRE 2022 Spectral Reconstruction Challenge, our approach won the First place.
arXiv Detail & Related papers (2022-04-17T02:39:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.