Related papers: Uncertainty-Guided Expert-AI Collaboration for Efficient Soil Horizon Annotation

Uncertainty-Guided Expert-AI Collaboration for Efficient Soil Horizon Annotation

URL: http://arxiv.org/abs/2509.24873v1
Date: Mon, 29 Sep 2025 14:54:23 GMT
Title: Uncertainty-Guided Expert-AI Collaboration for Efficient Soil Horizon Annotation
Authors: Teodor Chiaburu, Vipin Singh, Frank Haußer, Felix Bießmann,
Abstract summary: We apply conformal prediction to $textitSoilNet$, a multimodal multitask model for describing soil profiles.<n>We design a simulated human-in-the-loop (HIL) annotation pipeline, where a limited budget for obtaining ground truth annotations is available when model uncertainty is high.<n>Experiments show that conformalizing SoilNet leads to more efficient annotation in regression tasks and comparable performance scores in classification tasks.
Score: 0.13999481573773068
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Uncertainty quantification is essential in human-machine collaboration, as human agents tend to adjust their decisions based on the confidence of the machine counterpart. Reliably calibrated model uncertainties, hence, enable more effective collaboration, targeted expert intervention and more responsible usage of Machine Learning (ML) systems. Conformal prediction has become a well established model-agnostic framework for uncertainty calibration of ML models, offering statistically valid confidence estimates for both regression and classification tasks. In this work, we apply conformal prediction to $\textit{SoilNet}$, a multimodal multitask model for describing soil profiles. We design a simulated human-in-the-loop (HIL) annotation pipeline, where a limited budget for obtaining ground truth annotations from domain experts is available when model uncertainty is high. Our experiments show that conformalizing SoilNet leads to more efficient annotation in regression tasks and comparable performance scores in classification tasks under the same annotation budget when tested against its non-conformal counterpart. All code and experiments can be found in our repository: https://github.com/calgo-lab/BGR

Related papers

On Calibration of Large Language Models: From Response To Capability [66.59139960234326]
Large language models (LLMs) are widely deployed as general-purpose problem solvers.<n>We introduce capability calibration, which targets the model's expected accuracy on a query.<n>Our results demonstrate that capability-calibrated confidence improves pass@$k$ prediction and inference budget allocation.
arXiv Detail & Related papers (2026-02-14T01:07:45Z)
Fault-Tolerant Evaluation for Sample-Efficient Model Performance Estimators [13.227055178509524]
We propose a fault-tolerant evaluation framework that integrates bias and variance considerations within an adjustable tolerance level.<n>We show that proper calibration of $varepsilon$ ensures reliable evaluation across different variance regimes.<n> Experiments on real-world datasets demonstrate that our framework provides comprehensive and actionable insights into estimator behavior.
arXiv Detail & Related papers (2026-02-06T22:14:46Z)
Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback [8.538830579425147]
We study estimation and statistical reward models used in aligning large language (LLMs)<n>A key component of LLM alignment is reinforcement learning from human feedback.
arXiv Detail & Related papers (2025-12-02T20:22:25Z)
Principled Input-Output-Conditioned Post-Hoc Uncertainty Estimation for Regression Networks [1.4671424999873808]
Uncertainty is critical in safety-sensitive applications but is often omitted from off-the-shelf neural networks due to adverse effects on predictive performance.<n>We propose a theoretically grounded framework for post-hoc uncertainty estimation in regression tasks by fitting an auxiliary model to both original inputs and frozen model outputs.
arXiv Detail & Related papers (2025-06-01T09:13:27Z)
Unveil Sources of Uncertainty: Feature Contribution to Conformal Prediction Intervals [0.3495246564946556]
We propose a novel, model-agnostic uncertainty attribution (UA) method grounded in conformal prediction (CP)<n>We define cooperative games where CP interval properties-such as width and bounds-serve as value functions, we attribute predictive uncertainty to input features.<n>Our experiments on synthetic benchmarks and real-world datasets demonstrate the practical utility and interpretative depth of our approach.
arXiv Detail & Related papers (2025-05-19T13:49:05Z)
Efficient distributional regression trees learning algorithms for calibrated non-parametric probabilistic forecasts [1.0108345815812638]
In the context of regression, instead of estimating a conditional mean, this can be achieved by producing a predictive interval for the output.<n>This paper introduces novel algorithms for learning probabilistic regression trees for the WIS or CRPS loss functions.
arXiv Detail & Related papers (2025-02-07T18:39:35Z)
Boosted Control Functions: Distribution generalization and invariance in confounded models [10.503777692702952]
We introduce a strong notion of invariance that allows for distribution generalization even in the presence of nonlinear, non-identifiable structural functions.<n>We propose the ControlTwicing algorithm to estimate the Boosted Control Function (BCF) using flexible machine-learning techniques.
arXiv Detail & Related papers (2023-10-09T15:43:46Z)
Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores. We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z)
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z)
Compound Density Networks for Risk Prediction using Electronic Health Records [1.1786249372283562]
We propose an integrated end-to-end approach by utilizing a Compound Density Network (CDNet) CDNet allows the imputation method and prediction model to be tuned together within a single framework. We validate CDNet on the mortality prediction task on the MIMIC-III dataset.
arXiv Detail & Related papers (2022-08-02T09:04:20Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models. We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs. Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.