Product Quantization for Surface Soil Similarity
- URL: http://arxiv.org/abs/2506.03374v1
- Date: Tue, 03 Jun 2025 20:31:34 GMT
- Title: Product Quantization for Surface Soil Similarity
- Authors: Haley Dozier, Althea Henslee, Ashley Abraham, Andrew Strelzoff, Mark Chappell,
- Abstract summary: soil researchers move beyond limitations of human visualization and create classifications of high-dimension datasets.<n>This pipeline allows for the possibility of producing both highly accurate and flexible soil with classes built to fit a specific application.<n>The machine learning pipeline outlined in this work combines product quantization with the systematic evaluation of parameters and output to get the best available results.
- Score: 0.44938884406455726
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The use of machine learning (ML) techniques has allowed rapid advancements in many scientific and engineering fields. One of these problems is that of surface soil taxonomy, a research area previously hindered by the reliance on human-derived classifications, which are mostly dependent on dividing a dataset based on historical understandings of that data rather than data-driven, statistically observable similarities. Using a ML-based taxonomy allows soil researchers to move beyond the limitations of human visualization and create classifications of high-dimension datasets with a much higher level of specificity than possible with hand-drawn taxonomies. Furthermore, this pipeline allows for the possibility of producing both highly accurate and flexible soil taxonomies with classes built to fit a specific application. The machine learning pipeline outlined in this work combines product quantization with the systematic evaluation of parameters and output to get the best available results, rather than accepting sub-optimal results by using either default settings or best guess settings.
Related papers
- LimeSoDa: A Dataset Collection for Benchmarking of Machine Learning Regressors in Digital Soil Mapping [0.0]
Benchmarking studies on multiple datasets are needed to reveal strengths and limitations of commonly used methods.<n>LimeSoDa consists of 31 field- and farm-scale datasets from various countries.<n>We demonstrated the use of LimeSoDa for benchmarking by comparing the predictive performance of four learning algorithms across all datasets.
arXiv Detail & Related papers (2025-02-27T14:31:36Z) - LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science.<n>Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Notes on Applicability of Explainable AI Methods to Machine Learning
Models Using Features Extracted by Persistent Homology [0.0]
Persistent homology (PH) has found wide-ranging applications in machine learning.
The ability to achieve satisfactory levels of accuracy with relatively simple downstream machine learning models, when processing these extracted features, underlines the pipeline's superior interpretability.
We explore the potential application of explainable AI methodologies to this PH-ML pipeline.
arXiv Detail & Related papers (2023-10-15T08:56:15Z) - SSL-SoilNet: A Hybrid Transformer-based Framework with Self-Supervised Learning for Large-scale Soil Organic Carbon Prediction [2.554658234030785]
This study introduces a novel approach that aims to learn the geographical link between multimodal features via self-supervised contrastive learning.
The proposed approach has undergone rigorous testing on two distinct large-scale datasets.
arXiv Detail & Related papers (2023-08-07T13:44:44Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Deep Contrastive Graph Representation via Adaptive Homotopy Learning [76.22904270821778]
Homotopy model is an excellent tool exploited by diverse research works in the field of machine learning.
We propose a novel adaptive homotopy framework (AH) in which the Maclaurin duality is employed.
AH can be widely utilized to enhance the homotopy-based algorithm.
arXiv Detail & Related papers (2021-06-17T04:46:04Z) - Surface Warping Incorporating Machine Learning Assisted Domain
Likelihood Estimation: A New Paradigm in Mine Geology Modelling and
Automation [68.8204255655161]
A Bayesian warping technique has been proposed to reshape modeled surfaces based on geochemical and spatial constraints imposed by newly acquired blasthole data.
This paper focuses on incorporating machine learning in this warping framework to make the likelihood generalizable.
Its foundation is laid by a Bayesian computation in which the geological domain likelihood given the chemistry, p(g|c) plays a similar role to p(y(c)|g.
arXiv Detail & Related papers (2021-02-15T10:37:52Z) - Using vis-NIRS and Machine Learning methods to diagnose sugarcane soil
chemical properties [0.0]
Knowing chemical soil properties might be determinant in crop management and total yield production.
Traditional property estimation approaches are time-consuming and require complex lab setups.
Property estimation from spectral signals(vis-NIRS), emerged as a low-cost, non-invasive, and non-destructive alternative.
arXiv Detail & Related papers (2020-12-23T21:46:41Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - Adaptive Discrete Smoothing for High-Dimensional and Nonlinear Panel
Data [4.550919471480445]
We develop a data-driven smoothing technique for high-dimensional and non-linear panel data models.
The weights are determined by a data-driven way and depend on the similarity between the corresponding functions.
We conduct a simulation study which shows that the prediction can be greatly improved by using our estimator.
arXiv Detail & Related papers (2019-12-30T09:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.