Related papers: Bag of Coins: A Statistical Probe into Neural Confidence Structures

Bag of Coins: A Statistical Probe into Neural Confidence Structures

URL: http://arxiv.org/abs/2507.19774v1
Date: Sat, 26 Jul 2025 03:54:32 GMT
Title: Bag of Coins: A Statistical Probe into Neural Confidence Structures
Authors: Agnideep Aich, Ashit Baran Aich, Md Monzur Murshed, Sameera Hewage, Bruce Wade,
Abstract summary: Bag-of-Coins (BoC) test examines the internal consistency of a classifier's logits.<n>On Vision Transformers (ViTs), the BoC output serves as a state-of-the-art confidence score, achieving near-perfect calibration.<n>On Convolutional Neural Networks (CNNs) like ResNet, the probe reveals a deep inconsistency between the model's predictions and its internal logit structure.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern neural networks, despite their high accuracy, often produce poorly calibrated confidence scores, limiting their reliability in high-stakes applications. Existing calibration methods typically post-process model outputs without interrogating the internal consistency of the predictions themselves. In this work, we introduce a novel, non-parametric statistical probe, the Bag-of-Coins (BoC) test, that examines the internal consistency of a classifier's logits. The BoC test reframes confidence estimation as a frequentist hypothesis test: does the model's top-ranked class win 1-v-1 contests against random competitors at a rate consistent with its own stated softmax probability? When applied to modern deep learning architectures, this simple probe reveals a fundamental dichotomy. On Vision Transformers (ViTs), the BoC output serves as a state-of-the-art confidence score, achieving near-perfect calibration with an ECE of 0.0212, an 88% improvement over a temperature-scaled baseline. Conversely, on Convolutional Neural Networks (CNNs) like ResNet, the probe reveals a deep inconsistency between the model's predictions and its internal logit structure, a property missed by traditional metrics. We posit that BoC is not merely a calibration method, but a new diagnostic tool for understanding and exposing the differing ways that popular architectures represent uncertainty.

Related papers

Revisiting Confidence Estimation: Towards Reliable Failure Prediction [53.79160907725975]
We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors. We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
arXiv Detail & Related papers (2024-03-05T11:44:14Z)
Towards Calibrated Deep Clustering Network [60.71776081164377]
In deep clustering, the estimated confidence for a sample belonging to a particular cluster greatly exceeds its actual prediction accuracy.<n>We propose a novel dual head (calibration head and clustering head) deep clustering model that can effectively calibrate the estimated confidence and the actual accuracy.<n>The proposed calibrated deep clustering model not only surpasses the state-of-the-art deep clustering methods by 5x on average in terms of expected calibration error, but also significantly outperforms them in terms of clustering accuracy.
arXiv Detail & Related papers (2024-03-04T11:23:40Z)
Calibrating Neural Simulation-Based Inference with Differentiable Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model. By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation. It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z)
Multiclass Alignment of Confidence and Certainty for Network Calibration [10.15706847741555]
Recent studies reveal that deep neural networks (DNNs) are prone to making overconfident predictions. We propose a new train-time calibration method, which features a simple, plug-and-play auxiliary loss known as multi-class alignment of predictive mean confidence and predictive certainty (MACC) Our method achieves state-of-the-art calibration performance for both in-domain and out-domain predictions.
arXiv Detail & Related papers (2023-09-06T00:56:24Z)
Calibrating Deep Neural Networks using Explicit Regularisation and Dynamic Data Pruning [25.982037837953268]
Deep neural networks (DNN) are prone to miscalibrated predictions, often exhibiting a mismatch between the predicted output and the associated confidence scores. We propose a novel regularization technique that can be used with classification losses, leading to state-of-the-art calibrated predictions at test time.
arXiv Detail & Related papers (2022-12-20T05:34:58Z)
Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously. Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z)
On double-descent in uncertainty quantification in overparametrized models [24.073221004661427]
Uncertainty quantification is a central challenge in reliable and trustworthy machine learning. We show a trade-off between classification accuracy and calibration, unveiling a double descent like behavior in the calibration curve of optimally regularized estimators. This is in contrast with the empirical Bayes method, which we show to be well calibrated in our setting despite the higher generalization error and overparametrization.
arXiv Detail & Related papers (2022-10-23T16:01:08Z)
Bayesian Confidence Calibration for Epistemic Uncertainty Modelling [4.358626952482686]
We introduce a framework to obtain confidence estimates in conjunction with an uncertainty of the calibration method. We achieve state-of-the-art calibration performance for object detection calibration.
arXiv Detail & Related papers (2021-09-21T10:53:16Z)
Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.