Beyond Confidence: Reliable Models Should Also Consider Atypicality
- URL: http://arxiv.org/abs/2305.18262v2
- Date: Mon, 30 Oct 2023 05:24:15 GMT
- Title: Beyond Confidence: Reliable Models Should Also Consider Atypicality
- Authors: Mert Yuksekgonul, Linjun Zhang, James Zou, Carlos Guestrin
- Abstract summary: We investigate the relationship between how atypical(rare) a sample or a class is and the reliability of a model's predictions.
We show that predictions for atypical inputs or atypical classes are more overconfident and have lower accuracy.
We propose that models should use not only confidence but also atypicality to improve uncertainty quantification and performance.
- Score: 43.012818086415514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While most machine learning models can provide confidence in their
predictions, confidence is insufficient to understand a prediction's
reliability. For instance, the model may have a low confidence prediction if
the input is not well-represented in the training dataset or if the input is
inherently ambiguous. In this work, we investigate the relationship between how
atypical(rare) a sample or a class is and the reliability of a model's
predictions. We first demonstrate that atypicality is strongly related to
miscalibration and accuracy. In particular, we empirically show that
predictions for atypical inputs or atypical classes are more overconfident and
have lower accuracy. Using these insights, we show incorporating atypicality
improves uncertainty quantification and model performance for discriminative
neural networks and large language models. In a case study, we show that using
atypicality improves the performance of a skin lesion classifier across
different skin tone groups without having access to the group attributes.
Overall, we propose that models should use not only confidence but also
atypicality to improve uncertainty quantification and performance. Our results
demonstrate that simple post-hoc atypicality estimators can provide significant
value.
Related papers
- Confidence-based Estimators for Predictive Performance in Model Monitoring [0.5399800035598186]
After a machine learning model has been deployed into production, its predictive performance needs to be monitored.
Recently, novel methods for estimating the predictive performance of a model when ground truth is unavailable have been developed.
We show that under certain general assumptions, the Average Confidence (AC) method is an unbiased and consistent estimator of model accuracy.
arXiv Detail & Related papers (2024-07-11T16:28:31Z) - Learning Sample Difficulty from Pre-trained Models for Reliable
Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization.
We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z) - A roadmap to fair and trustworthy prediction model validation in
healthcare [2.476158303361112]
A prediction model is most useful if it generalizes beyond the development data.
We propose a roadmap that facilitates the development and application of reliable, fair, and trustworthy artificial intelligence prediction models.
arXiv Detail & Related papers (2023-04-07T04:24:19Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Confidence and Dispersity Speak: Characterising Prediction Matrix for
Unsupervised Accuracy Estimation [51.809741427975105]
This work aims to assess how well a model performs under distribution shifts without using labels.
We use the nuclear norm that has been shown to be effective in characterizing both properties.
We show that the nuclear norm is more accurate and robust in accuracy than existing methods.
arXiv Detail & Related papers (2023-02-02T13:30:48Z) - Reliability-Aware Prediction via Uncertainty Learning for Person Image
Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously.
Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z) - Calibrated Selective Classification [34.08454890436067]
We develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties.
We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model.
We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
arXiv Detail & Related papers (2022-08-25T13:31:09Z) - Predictive Multiplicity in Probabilistic Classification [25.111463701666864]
We present a framework for measuring predictive multiplicity in probabilistic classification.
We demonstrate the incidence and prevalence of predictive multiplicity in real-world tasks.
Our results emphasize the need to report predictive multiplicity more widely.
arXiv Detail & Related papers (2022-06-02T16:25:29Z) - Generalized Adversarial Distances to Efficiently Discover Classifier
Errors [0.0]
High-confidence errors are rare events for which the model is highly confident in its prediction, but is wrong.
We propose a generalization to the Adversarial Distance search that leverages concepts from adversarial machine learning.
Experimental results show that the generalized method finds errors at rates greater than expected given the confidence of the sampled predictions.
arXiv Detail & Related papers (2021-02-25T13:31:21Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.