Evaluating AI systems under uncertain ground truth: a case study in
dermatology
- URL: http://arxiv.org/abs/2307.02191v1
- Date: Wed, 5 Jul 2023 10:33:45 GMT
- Title: Evaluating AI systems under uncertain ground truth: a case study in
dermatology
- Authors: David Stutz, Ali Taylan Cemgil, Abhijit Guha Roy, Tatiana
Matejovicova, Melih Barsbey, Patricia Strachan, Mike Schaekermann, Jan
Freyberg, Rajeev Rikhye, Beverly Freeman, Javier Perez Matos, Umesh Telang,
Dale R. Webster, Yuan Liu, Greg S. Corrado, Yossi Matias, Pushmeet Kohli, Yun
Liu, Arnaud Doucet, Alan Karthikesalingam
- Abstract summary: We propose a metric for measuring annotation uncertainty and provide uncertainty-adjusted metrics for performance evaluation.
We present a case study applying our framework to skin condition classification from images where annotations are provided in the form of differential diagnoses.
- Score: 44.80772162289557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For safety, AI systems in health undergo thorough evaluations before
deployment, validating their predictions against a ground truth that is assumed
certain. However, this is actually not the case and the ground truth may be
uncertain. Unfortunately, this is largely ignored in standard evaluation of AI
models but can have severe consequences such as overestimating the future
performance. To avoid this, we measure the effects of ground truth uncertainty,
which we assume decomposes into two main components: annotation uncertainty
which stems from the lack of reliable annotations, and inherent uncertainty due
to limited observational information. This ground truth uncertainty is ignored
when estimating the ground truth by deterministically aggregating annotations,
e.g., by majority voting or averaging. In contrast, we propose a framework
where aggregation is done using a statistical model. Specifically, we frame
aggregation of annotations as posterior inference of so-called plausibilities,
representing distributions over classes in a classification setting, subject to
a hyper-parameter encoding annotator reliability. Based on this model, we
propose a metric for measuring annotation uncertainty and provide
uncertainty-adjusted metrics for performance evaluation. We present a case
study applying our framework to skin condition classification from images where
annotations are provided in the form of differential diagnoses. The
deterministic adjudication process called inverse rank normalization (IRN) from
previous work ignores ground truth uncertainty in evaluation. Instead, we
present two alternative statistical models: a probabilistic version of IRN and
a Plackett-Luce-based model. We find that a large portion of the dataset
exhibits significant ground truth uncertainty and standard IRN-based evaluation
severely over-estimates performance without providing uncertainty estimates.
Related papers
- Uncertainty-aware abstention in medical diagnosis based on medical texts [87.88110503208016]
This study addresses the critical issue of reliability for AI-assisted medical diagnosis.
We focus on the selection prediction approach that allows the diagnosis system to abstain from providing the decision if it is not confident in the diagnosis.
We introduce HUQ-2, a new state-of-the-art method for enhancing reliability in selective prediction tasks.
arXiv Detail & Related papers (2025-02-25T10:15:21Z) - Evaluation of uncertainty estimations for Gaussian process regression based machine learning interatomic potentials [0.0]
Uncertainty estimations for machine learning interatomic potentials are crucial to quantify the additional model error they introduce.
We consider GPR models with Coulomb and SOAP representations as inputs to predict potential energy surfaces and excitation energies of molecules.
We evaluate, how the GPR variance and ensemble-based uncertainties relate to the error and whether model performance improves by selecting the most uncertain samples from a fixed configuration space.
arXiv Detail & Related papers (2024-10-27T10:06:09Z) - SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing [67.8991481023825]
Sepsis is the leading cause of in-hospital mortality in the USA.
Existing predictive models are usually trained on high-quality data with few missing information.
For the potential high-risk patients with low confidence due to limited observations, we propose a robust active sensing algorithm.
arXiv Detail & Related papers (2024-07-24T04:47:36Z) - Diagnosis Uncertain Models For Medical Risk Prediction [80.07192791931533]
We consider a patient risk model which has access to vital signs, lab values, and prior history but does not have access to a patient's diagnosis.
We show that such all-cause' risk models have good generalization across diagnoses but have a predictable failure mode.
We propose a fix for this problem by explicitly modeling the uncertainty in risk prediction coming from uncertainty in patient diagnoses.
arXiv Detail & Related papers (2023-06-29T23:36:04Z) - Towards Reliable Medical Image Segmentation by utilizing Evidential Calibrated Uncertainty [52.03490691733464]
We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks.
By leveraging subjective logic theory, we explicitly model probability and uncertainty for the problem of medical image segmentation.
DeviS incorporates an uncertainty-aware filtering module, which utilizes the metric of uncertainty-calibrated error to filter reliable data.
arXiv Detail & Related papers (2023-01-01T05:02:46Z) - The Implicit Delta Method [61.36121543728134]
In this paper, we propose an alternative, the implicit delta method, which works by infinitesimally regularizing the training loss of uncertainty.
We show that the change in the evaluation due to regularization is consistent for the variance of the evaluation estimator, even when the infinitesimal change is approximated by a finite difference.
arXiv Detail & Related papers (2022-11-11T19:34:17Z) - Reliability-Aware Prediction via Uncertainty Learning for Person Image
Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously.
Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z) - Uncertainty Estimates of Predictions via a General Bias-Variance
Decomposition [7.811916700683125]
We introduce a bias-variance decomposition for proper scores, giving rise to the Bregman Information as the variance term.
We showcase the practical relevance of this decomposition on several downstream tasks, including model ensembles and confidence regions.
arXiv Detail & Related papers (2022-10-21T21:24:37Z) - Uncertainty estimations methods for a deep learning model to aid in
clinical decision-making -- a clinician's perspective [0.0]
There are several deep learning-inspired uncertainty estimation techniques, but few are implemented on medical datasets.
We compared dropout variational inference (DO), test-time augmentation (TTA), conformal predictions, and single deterministic methods for estimating uncertainty.
It may be important to evaluate multiple estimations techniques before incorporating a model into clinical practice.
arXiv Detail & Related papers (2022-10-02T17:54:54Z) - Can uncertainty boost the reliability of AI-based diagnostic methods in
digital pathology? [3.8424737607413157]
We evaluate if adding uncertainty estimates for DL predictions in digital pathology could result in increased value for the clinical applications.
We compare the effectiveness of model-integrated methods (MC dropout and Deep ensembles) with a model-agnostic approach.
Our results show that uncertainty estimates can add some reliability and reduce sensitivity to classification threshold selection.
arXiv Detail & Related papers (2021-12-17T10:10:00Z) - Dense Uncertainty Estimation via an Ensemble-based Conditional Latent
Variable Model [68.34559610536614]
We argue that the aleatoric uncertainty is an inherent attribute of the data and can only be correctly estimated with an unbiased oracle model.
We propose a new sampling and selection strategy at train time to approximate the oracle model for aleatoric uncertainty estimation.
Our results show that our solution achieves both accurate deterministic results and reliable uncertainty estimation.
arXiv Detail & Related papers (2021-11-22T08:54:10Z) - Identifying Incorrect Classifications with Balanced Uncertainty [21.130311978327196]
Uncertainty estimation is critical for cost-sensitive deep-learning applications.
We propose the distributional imbalance to model the imbalance in uncertainty estimation as two kinds of distribution biases.
We then propose Balanced True Class Probability framework, which learns an uncertainty estimator with a novel Distributional Focal Loss objective.
arXiv Detail & Related papers (2021-10-15T11:52:31Z) - DEUP: Direct Epistemic Uncertainty Prediction [56.087230230128185]
Epistemic uncertainty is part of out-of-sample prediction error due to the lack of knowledge of the learner.
We propose a principled approach for directly estimating epistemic uncertainty by learning to predict generalization error and subtracting an estimate of aleatoric uncertainty.
arXiv Detail & Related papers (2021-02-16T23:50:35Z) - Approaching Neural Network Uncertainty Realism [53.308409014122816]
Quantifying or at least upper-bounding uncertainties is vital for safety-critical systems such as autonomous vehicles.
We evaluate uncertainty realism -- a strict quality criterion -- with a Mahalanobis distance-based statistical test.
We adopt it to the automotive domain and show that it significantly improves uncertainty realism compared to a plain encoder-decoder model.
arXiv Detail & Related papers (2021-01-08T11:56:12Z) - Trust Issues: Uncertainty Estimation Does Not Enable Reliable OOD
Detection On Medical Tabular Data [0.0]
We present a series of tests including a large variety of contemporary uncertainty estimation techniques.
In contrast to previous work, we design tests on realistic and clinically relevant OOD groups, and run experiments on real-world medical data.
arXiv Detail & Related papers (2020-11-06T10:41:39Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Diagnostic Uncertainty Calibration: Towards Reliable Machine Predictions
in Medical Domain [20.237847764018138]
We propose an evaluation framework for class probability estimates (CPEs) in the presence of label uncertainty.
We also formalize evaluation metrics for higher-order statistics, including inter-rater disagreement.
We show that our approach significantly enhances the reliability of uncertainty estimates.
arXiv Detail & Related papers (2020-07-03T12:54:08Z) - Uncertainty estimation for classification and risk prediction on medical
tabular data [0.0]
This work advances the understanding of uncertainty estimation for classification and risk prediction on medical data.
In a data-scarce field such as healthcare, the ability to measure the uncertainty of a model's prediction could potentially lead to improved effectiveness of decision support tools.
arXiv Detail & Related papers (2020-04-13T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.