Related papers: When to Accept Automated Predictions and When to Defer to Human Judgment?

When to Accept Automated Predictions and When to Defer to Human Judgment?

URL: http://arxiv.org/abs/2407.07821v2
Date: Tue, 13 Aug 2024 09:06:08 GMT
Title: When to Accept Automated Predictions and When to Defer to Human Judgment?
Authors: Daniel Sikar, Artur Garcez, Tillman Weyde, Robin Bloomfield, Kaleem Peeroo,
Abstract summary: We analyze how the outputs of a trained neural network change using clustering to measure distances between outputs and class centroids. We propose this distance as a metric to evaluate the confidence of predictions under distribution shifts.
Score: 1.9922905420195367
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ensuring the reliability and safety of automated decision-making is crucial. It is well-known that data distribution shifts in machine learning can produce unreliable outcomes. This paper proposes a new approach for measuring the reliability of predictions under distribution shifts. We analyze how the outputs of a trained neural network change using clustering to measure distances between outputs and class centroids. We propose this distance as a metric to evaluate the confidence of predictions under distribution shifts. We assign each prediction to a cluster with centroid representing the mean softmax output for all correct predictions of a given class. We then define a safety threshold for a class as the smallest distance from an incorrect prediction to the given class centroid. We evaluate the approach on the MNIST and CIFAR-10 datasets using a Convolutional Neural Network and a Vision Transformer, respectively. The results show that our approach is consistent across these data sets and network models, and indicate that the proposed metric can offer an efficient way of determining when automated predictions are acceptable and when they should be deferred to human operators given a distribution shift.

Related papers

Explorations of the Softmax Space: Knowing When the Neural Network Doesn't Know [2.6626950367610394]
This paper proposes a new approach for measuring confidence in the predictions of any neural network. We identify that a high-accuracy trained network may have certain outputs for which there should be low confidence. We show that a cluster with centroid calculated simply as the mean softmax output for all correct predictions can serve as a suitable proxy in the evaluation of confidence.
arXiv Detail & Related papers (2025-02-01T15:25:03Z)
Prediction-Powered Inference with Imputed Covariates and Nonuniform Sampling [20.078602767179355]
Failure to properly account for errors in machine learning predictions renders standard statistical procedures invalid. We introduce bootstrap confidence intervals that apply when the complete data is a nonuniform (i.e., weighted, stratified, or clustered) sample and to settings where an arbitrary subset of features is imputed. We prove that these confidence intervals are valid under no assumptions on the quality of the machine learning model and are no wider than the intervals obtained by methods that do not use machine learning predictions.
arXiv Detail & Related papers (2025-01-30T18:46:43Z)
Distributionally Robust Machine Learning with Multi-source Data [6.383451076043423]
We introduce a group distributionally robust prediction model to optimize an adversarial reward about explained variance with respect to a class of target distributions. Compared to classical empirical risk minimization, the proposed robust prediction model improves the prediction accuracy for target populations with distribution shifts. We demonstrate the performance of our proposed group distributionally robust method on simulated and real data with random forests and neural networks as base-learning algorithms.
arXiv Detail & Related papers (2023-09-05T13:19:40Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points. Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters. We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z)
Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z)
Estimating and Evaluating Regression Predictive Uncertainty in Deep Object Detectors [9.273998041238224]
We show that training variance networks with negative log likelihood (NLL) can lead to high entropy predictive distributions. We propose to use the energy score as a non-local proper scoring rule and find that when used for training, the energy score leads to better calibrated and lower entropy predictive distributions.
arXiv Detail & Related papers (2021-01-13T12:53:54Z)
Cross-Validation and Uncertainty Determination for Randomized Neural Networks with Applications to Mobile Sensors [0.0]
Extreme learning machines provide an attractive and efficient method for supervised learning under limited computing ressources and green machine learning. Results are discussed about supervised learning with such networks and regression methods in terms of consistency and bounds for the generalization and prediction error.
arXiv Detail & Related papers (2021-01-06T12:28:06Z)
Uncertainty Estimation and Sample Selection for Crowd Counting [87.29137075538213]
We present a method for image-based crowd counting that can predict a crowd density map together with the uncertainty values pertaining to the predicted density map. A key advantage of our method over existing crowd counting methods is its ability to quantify the uncertainty of its predictions. We show that our sample selection strategy drastically reduces the amount of labeled data needed to adapt a counting network trained on a source domain to the target domain.
arXiv Detail & Related papers (2020-09-30T03:40:07Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass. We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
Calibrated Prediction with Covariate Shift via Unsupervised Domain Adaptation [25.97333838935589]
Uncertainty estimates are an important tool for helping autonomous agents or human decision makers understand and leverage predictive models. Existing algorithms can overestimate certainty, possibly yielding a false sense of confidence in the predictive model.
arXiv Detail & Related papers (2020-02-29T20:31:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.