Related papers: Revisiting Confidence Estimation: Towards Reliable Failure Prediction

Revisiting Confidence Estimation: Towards Reliable Failure Prediction

URL: http://arxiv.org/abs/2403.02886v1
Date: Tue, 5 Mar 2024 11:44:14 GMT
Title: Revisiting Confidence Estimation: Towards Reliable Failure Prediction
Authors: Fei Zhu, Xu-Yao Zhang, Zhen Cheng, Cheng-Lin Liu
Abstract summary: We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors. We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
Score: 53.79160907725975
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reliable confidence estimation is a challenging yet fundamental requirement in many risk-sensitive applications. However, modern deep neural networks are often overconfident for their incorrect predictions, i.e., misclassified samples from known classes, and out-of-distribution (OOD) samples from unknown classes. In recent years, many confidence calibration and OOD detection methods have been developed. In this paper, we find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors. We investigate this problem and reveal that popular calibration and OOD detection methods often lead to worse confidence separation between correctly classified and misclassified examples, making it difficult to decide whether to trust a prediction or not. Finally, we propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance under various settings including balanced, long-tailed, and covariate-shift classification scenarios. Our study not only provides a strong baseline for reliable confidence estimation but also acts as a bridge between understanding calibration, OOD detection, and failure prediction. The code is available at \url{https://github.com/Impression2805/FMFP}.

Related papers

Trust, or Don't Predict: Introducing the CWSA Family for Confidence-Aware Model Evaluation [0.0]
We introduce two new metrics Confidence-Weighted Selective Accuracy (CWSA) and its normalized variant CWSA+.<n>CWSA offers principled and interpretable way to evaluate predictive models under confidence thresholds.<n>We show that CWSA and CWSA+ both effectively detect nuanced failure modes and outperform classical metrics in trust-sensitive tests.
arXiv Detail & Related papers (2025-05-24T10:07:48Z)
Beyond Uncertainty Quantification: Learning Uncertainty for Trust-Informed Neural Network Decisions - A Case Study in COVID-19 Classification [7.383605511698832]
Reliable uncertainty quantification is critical in high-stakes applications, such as medical diagnosis.<n>Traditional uncertainty quantification methods rely on a predefined confidence threshold to classify predictions as confident or uncertain.<n>This approach assumes that predictions exceeding the threshold are trustworthy, while those below it are uncertain, without explicitly assessing the correctness of high-confidence predictions.<n>This study proposes an uncertainty-aware stacked neural network, which extends conventional uncertainty quantification by learning when predictions should be trusted.
arXiv Detail & Related papers (2024-09-19T04:20:12Z)
Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance. We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z)
Calibrating Multimodal Learning [94.65232214643436]
We propose a novel regularization technique, i.e., Calibrating Multimodal Learning (CML) regularization, to calibrate the predictive confidence of previous methods. This technique could be flexibly equipped by existing models and improve the performance in terms of confidence calibration, classification accuracy, and model robustness.
arXiv Detail & Related papers (2023-06-02T04:29:57Z)
Rethinking Confidence Calibration for Failure Prediction [37.43981354073841]
Modern deep neural networks are often overconfident for their incorrect predictions. We find that most confidence calibration methods are useless or harmful for failure prediction. We propose a simple hypothesis: flat minima is beneficial for failure prediction.
arXiv Detail & Related papers (2023-03-06T08:54:18Z)
Calibrating Deep Neural Networks using Explicit Regularisation and Dynamic Data Pruning [25.982037837953268]
Deep neural networks (DNN) are prone to miscalibrated predictions, often exhibiting a mismatch between the predicted output and the associated confidence scores. We propose a novel regularization technique that can be used with classification losses, leading to state-of-the-art calibrated predictions at test time.
arXiv Detail & Related papers (2022-12-20T05:34:58Z)
Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously. Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z)
Uncertainty-Aware Reliable Text Classification [21.517852608625127]
Deep neural networks have significantly contributed to the success in predictive accuracy for classification tasks. They tend to make over-confident predictions in real-world settings, where domain shifting and out-of-distribution examples exist. We propose an inexpensive framework that adopts both auxiliary outliers and pseudo off-manifold samples to train the model with prior knowledge of a certain class.
arXiv Detail & Related papers (2021-07-15T04:39:55Z)
Provably Robust Detection of Out-of-distribution Data (almost) for free [124.14121487542613]
Deep neural networks are known to produce highly overconfident predictions on out-of-distribution (OOD) data. In this paper we propose a novel method where from first principles we combine a certifiable OOD detector with a standard classifier into an OOD aware classifier. In this way we achieve the best of two worlds: certifiably adversarially robust OOD detection, even for OOD samples close to the in-distribution, without loss in prediction accuracy and close to state-of-the-art OOD detection performance for non-manipulated OOD data.
arXiv Detail & Related papers (2021-06-08T11:40:49Z)
Harnessing Adversarial Distances to Discover High-Confidence Errors [0.0]
We investigate the problem of finding errors at rates greater than expected given model confidence. We propose a query-efficient and novel search technique that is guided by adversarial perturbations.
arXiv Detail & Related papers (2020-06-29T13:44:16Z)
Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions [121.10450359856242]
We develop a frequentist procedure that utilizes influence functions of a model's loss functional to construct a jackknife (or leave-one-out) estimator of predictive confidence intervals. The DJ satisfies (1) and (2), is applicable to a wide range of deep learning models, is easy to implement, and can be applied in a post-hoc fashion without interfering with model training or compromising its accuracy.
arXiv Detail & Related papers (2020-06-29T13:36:52Z)
An Empirical Evaluation on Robustness and Uncertainty of Regularization Methods [43.25086015530892]
Deep neural networks (DNNs) behave fundamentally differently from humans. They can easily change predictions when small corruptions such as blur are applied on the input. They produce confident predictions on out-of-distribution samples (improper uncertainty measure)
arXiv Detail & Related papers (2020-03-09T01:15:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.