Related papers: Beyond Confidence: Reliable Models Should Also Consider Atypicality

Beyond Confidence: Reliable Models Should Also Consider Atypicality

URL: http://arxiv.org/abs/2305.18262v2
Date: Mon, 30 Oct 2023 05:24:15 GMT
Title: Beyond Confidence: Reliable Models Should Also Consider Atypicality
Authors: Mert Yuksekgonul, Linjun Zhang, James Zou, Carlos Guestrin
Abstract summary: We investigate the relationship between how atypical(rare) a sample or a class is and the reliability of a model's predictions. We show that predictions for atypical inputs or atypical classes are more overconfident and have lower accuracy. We propose that models should use not only confidence but also atypicality to improve uncertainty quantification and performance.
Score: 43.012818086415514
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While most machine learning models can provide confidence in their predictions, confidence is insufficient to understand a prediction's reliability. For instance, the model may have a low confidence prediction if the input is not well-represented in the training dataset or if the input is inherently ambiguous. In this work, we investigate the relationship between how atypical(rare) a sample or a class is and the reliability of a model's predictions. We first demonstrate that atypicality is strongly related to miscalibration and accuracy. In particular, we empirically show that predictions for atypical inputs or atypical classes are more overconfident and have lower accuracy. Using these insights, we show incorporating atypicality improves uncertainty quantification and model performance for discriminative neural networks and large language models. In a case study, we show that using atypicality improves the performance of a skin lesion classifier across different skin tone groups without having access to the group attributes. Overall, we propose that models should use not only confidence but also atypicality to improve uncertainty quantification and performance. Our results demonstrate that simple post-hoc atypicality estimators can provide significant value.

Related papers

On Arbitrary Predictions from Equally Valid Models [49.56463611078044]
Model multiplicity refers to multiple machine learning models that admit conflicting predictions for the same patient.<n>We show that even small ensembles can mitigate/eliminate predictive multiplicity in practice.
arXiv Detail & Related papers (2025-07-25T16:15:59Z)
ShortcutProbe: Probing Prediction Shortcuts for Learning Robust Models [26.544938760265136]
Deep learning models inadvertently learn spurious correlations between targets and non-essential features.<n>In this paper, we propose a novel post hoc spurious bias mitigation framework without requiring group labels.<n>Our framework, termed ShortcutProbe, identifies prediction shortcuts that reflect potential non-robustness in predictions in a given model's latent space.
arXiv Detail & Related papers (2025-05-20T04:21:17Z)
Are vision language models robust to uncertain inputs? [5.249651874118556]
We show that newer and larger vision language models exhibit improved robustness compared to earlier models, but still suffer from a tendency to strictly follow instructions.<n>For natural images such as ImageNet, this limitation can be overcome without pipeline modifications.<n>We propose a novel mechanism based on caption diversity to reveal a model's internal uncertainty.
arXiv Detail & Related papers (2025-05-17T03:16:49Z)
Validation of Conformal Prediction in Cervical Atypia Classification [1.8988964758950546]
deep learning based cervical cancer classification can potentially increase access to screening in low-resource regions.<n>Deep learning models are often overconfident and do not reliably reflect diagnostic uncertainty.<n>Con conformal prediction is a model-agnostic framework for generating prediction sets that contain likely classes for trained deep-learning models.
arXiv Detail & Related papers (2025-05-13T14:37:58Z)
Confidence-based Estimators for Predictive Performance in Model Monitoring [0.5399800035598186]
After a machine learning model has been deployed into production, its predictive performance needs to be monitored. Recently, novel methods for estimating the predictive performance of a model when ground truth is unavailable have been developed. We show that under certain general assumptions, the Average Confidence (AC) method is an unbiased and consistent estimator of model accuracy.
arXiv Detail & Related papers (2024-07-11T16:28:31Z)
Learning Sample Difficulty from Pre-trained Models for Reliable Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization. We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z)
A roadmap to fair and trustworthy prediction model validation in healthcare [2.476158303361112]
A prediction model is most useful if it generalizes beyond the development data. We propose a roadmap that facilitates the development and application of reliable, fair, and trustworthy artificial intelligence prediction models.
arXiv Detail & Related papers (2023-04-07T04:24:19Z)
Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores. We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z)
Confidence and Dispersity Speak: Characterising Prediction Matrix for Unsupervised Accuracy Estimation [51.809741427975105]
This work aims to assess how well a model performs under distribution shifts without using labels. We use the nuclear norm that has been shown to be effective in characterizing both properties. We show that the nuclear norm is more accurate and robust in accuracy than existing methods.
arXiv Detail & Related papers (2023-02-02T13:30:48Z)
Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously. Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z)
Calibrated Selective Classification [34.08454890436067]
We develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties. We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model. We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
arXiv Detail & Related papers (2022-08-25T13:31:09Z)
Predictive Multiplicity in Probabilistic Classification [25.111463701666864]
We present a framework for measuring predictive multiplicity in probabilistic classification. We demonstrate the incidence and prevalence of predictive multiplicity in real-world tasks. Our results emphasize the need to report predictive multiplicity more widely.
arXiv Detail & Related papers (2022-06-02T16:25:29Z)
Generalized Adversarial Distances to Efficiently Discover Classifier Errors [0.0]
High-confidence errors are rare events for which the model is highly confident in its prediction, but is wrong. We propose a generalization to the Adversarial Distance search that leverages concepts from adversarial machine learning. Experimental results show that the generalized method finds errors at rates greater than expected given the confidence of the sampled predictions.
arXiv Detail & Related papers (2021-02-25T13:31:21Z)
Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.