Related papers: MACEst: The reliable and trustworthy Model Agnostic Confidence Estimator

MACEst: The reliable and trustworthy Model Agnostic Confidence Estimator

URL: http://arxiv.org/abs/2109.01531v1
Date: Thu, 2 Sep 2021 14:34:06 GMT
Title: MACEst: The reliable and trustworthy Model Agnostic Confidence Estimator
Authors: Rhys Green, Matthew Rowe, Alberto Polleri
Abstract summary: We argue that any confidence estimates based upon standard machine learning point prediction algorithms are fundamentally flawed. We present MACEst, a Model Agnostic Confidence Estimator, which provides reliable and trustworthy confidence estimates.
Score: 0.17188280334580192
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reliable Confidence Estimates are hugely important for any machine learning model to be truly useful. In this paper, we argue that any confidence estimates based upon standard machine learning point prediction algorithms are fundamentally flawed and under situations with a large amount of epistemic uncertainty are likely to be untrustworthy. To address these issues, we present MACEst, a Model Agnostic Confidence Estimator, which provides reliable and trustworthy confidence estimates. The algorithm differs from current methods by estimating confidence independently as a local quantity which explicitly accounts for both aleatoric and epistemic uncertainty. This approach differs from standard calibration methods that use a global point prediction model as a starting point for the confidence estimate.

Related papers

Aurora: Are Android Malware Classifiers Reliable and Stable under Distribution Shift? [51.12297424766236]
AURORA is a framework to evaluate malware classifiers based on their confidence quality and operational resilience.<n>AURORA is complemented by a set of metrics designed to go beyond point-in-time performance.<n>The fragility in SOTA frameworks across datasets of varying drift suggests the need for a return to the whiteboard.
arXiv Detail & Related papers (2025-05-28T20:22:43Z)
Trust, or Don't Predict: Introducing the CWSA Family for Confidence-Aware Model Evaluation [0.0]
We introduce two new metrics Confidence-Weighted Selective Accuracy (CWSA) and its normalized variant CWSA+.<n>CWSA offers principled and interpretable way to evaluate predictive models under confidence thresholds.<n>We show that CWSA and CWSA+ both effectively detect nuanced failure modes and outperform classical metrics in trust-sensitive tests.
arXiv Detail & Related papers (2025-05-24T10:07:48Z)
Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences [62.52739672949452]
Language models (LMs) should provide reliable confidence estimates to help users detect mistakes in their outputs and defer to human experts when necessary. We propose relative confidence estimation, where we match up questions against each other and ask the model to make relative judgments of confidence. Treating each question as a "player" in a series of matchups against other questions and the model's preferences as match outcomes, we can use rank aggregation methods like Elo rating and Bradley-Terry to translate the model's confidence preferences into confidence scores.
arXiv Detail & Related papers (2025-02-03T07:43:27Z)
Confidence Aware Learning for Reliable Face Anti-spoofing [52.23271636362843]
We propose a Confidence Aware Face Anti-spoofing model, which is aware of its capability boundary. We estimate its confidence during the prediction of each sample. Experiments show that the proposed CA-FAS can effectively recognize samples with low prediction confidence.
arXiv Detail & Related papers (2024-11-02T14:29:02Z)
Automated Trustworthiness Testing for Machine Learning Classifiers [3.3423762257383207]
This paper proposes TOWER, the first technique to automatically create trustworthiness oracles that determine whether text classifier predictions are trustworthy. Our hypothesis is that a prediction is trustworthy if the words in its explanation are semantically related to the predicted class. The results show that TOWER can detect a decrease in trustworthiness as noise increases, but is not effective when evaluated against the human-labeled dataset.
arXiv Detail & Related papers (2024-06-07T20:25:05Z)
Revisiting Confidence Estimation: Towards Reliable Failure Prediction [53.79160907725975]
We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors. We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
arXiv Detail & Related papers (2024-03-05T11:44:14Z)
Trust, but Verify: Using Self-Supervised Probing to Improve Trustworthiness [29.320691367586004]
We introduce a new approach of self-supervised probing, which enables us to check and mitigate the overconfidence issue for a trained model. We provide a simple yet effective framework, which can be flexibly applied to existing trustworthiness-related methods in a plug-and-play manner.
arXiv Detail & Related papers (2023-02-06T08:57:20Z)
The Implicit Delta Method [61.36121543728134]
In this paper, we propose an alternative, the implicit delta method, which works by infinitesimally regularizing the training loss of uncertainty. We show that the change in the evaluation due to regularization is consistent for the variance of the evaluation estimator, even when the infinitesimal change is approximated by a finite difference.
arXiv Detail & Related papers (2022-11-11T19:34:17Z)
Confidence-Calibrated Face and Kinship Verification [8.570969129199467]
We introduce an effective confidence measure that allows verification models to convert a similarity score into a confidence score for any given face pair. We also propose a confidence-calibrated approach, termed Angular Scaling (ASC), which is easy to implement and can be readily applied to existing verification models. To the best of our knowledge, our work presents the first comprehensive confidence-calibrated solution for modern face and kinship verification tasks.
arXiv Detail & Related papers (2022-10-25T10:43:46Z)
Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously. Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z)
Learning Confidence for Transformer-based Neural Machine Translation [38.679505127679846]
We propose an unsupervised confidence estimate learning jointly with the training of the neural machine translation (NMT) model. We explain confidence as how many hints the NMT model needs to make a correct prediction, and more hints indicate low confidence. We demonstrate that our learned confidence estimate achieves high accuracy on extensive sentence/word-level quality estimation tasks.
arXiv Detail & Related papers (2022-03-22T01:51:58Z)
An evaluation of word-level confidence estimation for end-to-end automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR) We provide an extensive benchmark of popular confidence methods on four well-known speech datasets. Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z)
Binary Classification from Positive Data with Skewed Confidence [85.18941440826309]
Positive-confidence (Pconf) classification is a promising weakly-supervised learning method. In practice, the confidence may be skewed by bias arising in an annotation process. We introduce the parameterized model of the skewed confidence, and propose the method for selecting the hyper parameter.
arXiv Detail & Related papers (2020-01-29T00:04:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.