Rethinking Confidence Calibration for Failure Prediction
- URL: http://arxiv.org/abs/2303.02970v1
- Date: Mon, 6 Mar 2023 08:54:18 GMT
- Title: Rethinking Confidence Calibration for Failure Prediction
- Authors: Fei Zhu, Zhen Cheng, Xu-Yao Zhang, Cheng-Lin Liu
- Abstract summary: Modern deep neural networks are often overconfident for their incorrect predictions.
We find that most confidence calibration methods are useless or harmful for failure prediction.
We propose a simple hypothesis: flat minima is beneficial for failure prediction.
- Score: 37.43981354073841
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reliable confidence estimation for the predictions is important in many
safety-critical applications. However, modern deep neural networks are often
overconfident for their incorrect predictions. Recently, many calibration
methods have been proposed to alleviate the overconfidence problem. With
calibrated confidence, a primary and practical purpose is to detect
misclassification errors by filtering out low-confidence predictions (known as
failure prediction). In this paper, we find a general, widely-existed but
actually-neglected phenomenon that most confidence calibration methods are
useless or harmful for failure prediction. We investigate this problem and
reveal that popular confidence calibration methods often lead to worse
confidence separation between correct and incorrect samples, making it more
difficult to decide whether to trust a prediction or not. Finally, inspired by
the natural connection between flat minima and confidence separation, we
propose a simple hypothesis: flat minima is beneficial for failure prediction.
We verify this hypothesis via extensive experiments and further boost the
performance by combining two different flat minima techniques. Our code is
available at https://github.com/Impression2805/FMFP
Related papers
- Revisiting Confidence Estimation: Towards Reliable Failure Prediction [53.79160907725975]
We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors.
We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
arXiv Detail & Related papers (2024-03-05T11:44:14Z) - Multiclass Alignment of Confidence and Certainty for Network Calibration [10.15706847741555]
Recent studies reveal that deep neural networks (DNNs) are prone to making overconfident predictions.
We propose a new train-time calibration method, which features a simple, plug-and-play auxiliary loss known as multi-class alignment of predictive mean confidence and predictive certainty (MACC)
Our method achieves state-of-the-art calibration performance for both in-domain and out-domain predictions.
arXiv Detail & Related papers (2023-09-06T00:56:24Z) - Two Sides of Miscalibration: Identifying Over and Under-Confidence
Prediction for Network Calibration [1.192436948211501]
Proper confidence calibration of deep neural networks is essential for reliable predictions in safety-critical tasks.
Miscalibration can lead to model over-confidence and/or under-confidence.
We introduce a novel metric, a miscalibration score, to identify the overall and class-wise calibration status.
We use the class-wise miscalibration score as a proxy to design a calibration technique that can tackle both over and under-confidence.
arXiv Detail & Related papers (2023-08-06T17:59:14Z) - Calibrating Multimodal Learning [94.65232214643436]
We propose a novel regularization technique, i.e., Calibrating Multimodal Learning (CML) regularization, to calibrate the predictive confidence of previous methods.
This technique could be flexibly equipped by existing models and improve the performance in terms of confidence calibration, classification accuracy, and model robustness.
arXiv Detail & Related papers (2023-06-02T04:29:57Z) - Calibrating Deep Neural Networks using Explicit Regularisation and
Dynamic Data Pruning [25.982037837953268]
Deep neural networks (DNN) are prone to miscalibrated predictions, often exhibiting a mismatch between the predicted output and the associated confidence scores.
We propose a novel regularization technique that can be used with classification losses, leading to state-of-the-art calibrated predictions at test time.
arXiv Detail & Related papers (2022-12-20T05:34:58Z) - Beyond calibration: estimating the grouping loss of modern neural
networks [68.8204255655161]
Proper scoring rule theory shows that given the calibration loss, the missing piece to characterize individual errors is the grouping loss.
We show that modern neural network architectures in vision and NLP exhibit grouping loss, notably in distribution shifts settings.
arXiv Detail & Related papers (2022-10-28T07:04:20Z) - Reliability-Aware Prediction via Uncertainty Learning for Person Image
Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously.
Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z) - Learning to Predict Trustworthiness with Steep Slope Loss [69.40817968905495]
We study the problem of predicting trustworthiness on real-world large-scale datasets.
We observe that the trustworthiness predictors trained with prior-art loss functions are prone to view both correct predictions and incorrect predictions to be trustworthy.
We propose a novel steep slope loss to separate the features w.r.t. correct predictions from the ones w.r.t. incorrect predictions by two slide-like curves that oppose each other.
arXiv Detail & Related papers (2021-09-30T19:19:09Z) - How to Evaluate Uncertainty Estimates in Machine Learning for
Regression? [1.4610038284393165]
We show that both approaches to evaluating the quality of uncertainty estimates have serious flaws.
Firstly, both approaches cannot disentangle the separate components that jointly create the predictive uncertainty.
Thirdly, the current approach to test prediction intervals directly has additional flaws.
arXiv Detail & Related papers (2021-06-07T07:47:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.