Legitimate ground-truth-free metrics for deep uncertainty classification scoring
- URL: http://arxiv.org/abs/2410.23046v1
- Date: Wed, 30 Oct 2024 14:14:32 GMT
- Title: Legitimate ground-truth-free metrics for deep uncertainty classification scoring
- Authors: Arthur Pignet, Chiara Regniez, John Klein,
- Abstract summary: The use of Uncertainty Quantification (UQ) methods in production remains limited.
This limitation is exacerbated by the challenge of validating UQ methods in absence of UQ ground truth.
This paper investigates such metrics and proves that they are theoretically well-behaved and actually tied to some uncertainty ground truth.
- Score: 3.9599054392856483
- License:
- Abstract: Despite the increasing demand for safer machine learning practices, the use of Uncertainty Quantification (UQ) methods in production remains limited. This limitation is exacerbated by the challenge of validating UQ methods in absence of UQ ground truth. In classification tasks, when only a usual set of test data is at hand, several authors suggested different metrics that can be computed from such test points while assessing the quality of quantified uncertainties. This paper investigates such metrics and proves that they are theoretically well-behaved and actually tied to some uncertainty ground truth which is easily interpretable in terms of model prediction trustworthiness ranking. Equipped with those new results, and given the applicability of those metrics in the usual supervised paradigm, we argue that our contributions will help promoting a broader use of UQ in deep learning.
Related papers
- Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics.
We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs.
Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z) - Comparing the quality of neural network uncertainty estimates for
classification problems [0.0]
Uncertainty quantification (UQ) methods for deep learning (DL) models have received increased attention in the literature.
We use statistical methods of frequentist interval coverage and interval width to evaluate the quality of credible intervals.
We apply these different UQ for DL methods to a hyperspectral image target detection problem and show the inconsistency of the different methods' results.
arXiv Detail & Related papers (2023-08-11T01:55:14Z) - Conformal Methods for Quantifying Uncertainty in Spatiotemporal Data: A
Survey [0.0]
In high-risk settings, it is important that a model produces uncertainty to reflect its own confidence and avoid failures.
In this paper we survey recent works on uncertainty (UQ) for deep learning, in particular distribution-free Conformal Prediction method for its mathematical and wide applicability.
arXiv Detail & Related papers (2022-09-08T06:08:48Z) - Uncertainty-Driven Action Quality Assessment [67.20617610820857]
We propose a novel probabilistic model, named Uncertainty-Driven AQA (UD-AQA), to capture the diversity among multiple judge scores.
We generate the estimation of uncertainty for each prediction, which is employed to re-weight AQA regression loss.
Our proposed method achieves competitive results on three benchmarks including the Olympic events MTL-AQA and FineDiving, and the surgical skill JIGSAWS datasets.
arXiv Detail & Related papers (2022-07-29T07:21:15Z) - Towards Clear Expectations for Uncertainty Estimation [64.20262246029286]
Uncertainty Quantification (UQ) is crucial to achieve trustworthy Machine Learning (ML)
Most UQ methods suffer from disparate and inconsistent evaluation protocols.
This opinion paper offers a new perspective by specifying those requirements through five downstream tasks.
arXiv Detail & Related papers (2022-07-27T07:50:57Z) - What is Flagged in Uncertainty Quantification? Latent Density Models for
Uncertainty Categorization [68.15353480798244]
Uncertainty Quantification (UQ) is essential for creating trustworthy machine learning models.
Recent years have seen a steep rise in UQ methods that can flag suspicious examples.
We propose a framework for categorizing uncertain examples flagged by UQ methods in classification tasks.
arXiv Detail & Related papers (2022-07-11T19:47:00Z) - Distribution-free uncertainty quantification for classification under
label shift [105.27463615756733]
We focus on uncertainty quantification (UQ) for classification problems via two avenues.
We first argue that label shift hurts UQ, by showing degradation in coverage and calibration.
We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance.
arXiv Detail & Related papers (2021-03-04T20:51:03Z) - Empirical Frequentist Coverage of Deep Learning Uncertainty
Quantification Procedures [13.890139530120164]
We provide the first large scale evaluation of the empirical frequentist coverage properties of uncertainty quantification techniques.
We find that, in general, some methods do achieve desirable coverage properties on in distribution samples, but that coverage is not maintained on out-of-distribution data.
arXiv Detail & Related papers (2020-10-06T21:22:46Z) - Uncertainty Quantification Using Neural Networks for Molecular Property
Prediction [33.34534208450156]
We systematically evaluate several methods on five benchmark datasets using multiple complementary performance metrics.
None of the methods we tested is unequivocally superior to all others, and none produces a particularly reliable ranking of errors across multiple datasets.
We conclude with a practical recommendation as to which existing techniques seem to perform well relative to others.
arXiv Detail & Related papers (2020-05-20T13:31:20Z) - Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep
Learning [70.72363097550483]
In this study, we focus on in-domain uncertainty for image classification.
To provide more insight in this study, we introduce the deep ensemble equivalent score (DEE)
arXiv Detail & Related papers (2020-02-15T23:28:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.