Estimating Classification Confidence Using Kernel Densities
- URL: http://arxiv.org/abs/2207.06529v2
- Date: Fri, 15 Jul 2022 02:39:02 GMT
- Title: Estimating Classification Confidence Using Kernel Densities
- Authors: Peter Salamon, David Salamon, V. Adrian Cantu, Michelle An, Tyler
Perry, Robert A. Edwards, Anca M. Segall
- Abstract summary: This paper investigates the post-hoc calibration of confidence for "exploratory" machine learning classification problems.
We introduce and test four new algorithms designed to handle the idiosyncrasies of category-specific confidence estimation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the post-hoc calibration of confidence for
"exploratory" machine learning classification problems. The difficulty in these
problems stems from the continuing desire to push the boundaries of which
categories have enough examples to generalize from when curating datasets, and
confusion regarding the validity of those categories. We argue that for such
problems the "one-versus-all" approach (top-label calibration) must be used
rather than the "calibrate-the-full-response-matrix" approach advocated
elsewhere in the literature. We introduce and test four new algorithms designed
to handle the idiosyncrasies of category-specific confidence estimation. Chief
among these methods is the use of kernel density ratios for confidence
calibration including a novel, bulletproof algorithm for choosing the
bandwidth. We test our claims and explore the limits of calibration on a
bioinformatics application (PhANNs) as well as the classic MNIST benchmark.
Finally, our analysis argues that post-hoc calibration should always be
performed, should be based only on the test dataset, and should be
sanity-checked visually.
Related papers
- Confidence Calibration of Classifiers with Many Classes [5.018156030818883]
For classification models based on neural networks, the maximum predicted class probability is often used as a confidence score.
This score rarely predicts well the probability of making a correct prediction and requires a post-processing calibration step.
arXiv Detail & Related papers (2024-11-05T10:51:01Z) - Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks.
We analyze problem statement, calibration definitions, and different approaches to evaluation.
Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Investigation of Different Calibration Methods for Deep Speaker
Embedding based Verification Systems [66.61691401921296]
This paper presents an investigation over several methods of score calibration for deep speaker embedding extractors.
An additional focus of this research is to estimate the impact of score normalization on the calibration performance of the system.
arXiv Detail & Related papers (2022-03-28T21:22:22Z) - Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm.
Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z) - Least Square Calibration for Peer Review [18.063450032460047]
We propose a flexible framework, namely least square calibration (LSC), for selecting top candidates from peer ratings.
Our framework provably performs perfect calibration from noiseless linear scoring functions under mild assumptions.
Our algorithm consistently outperforms the baseline which select top papers based on the highest average ratings.
arXiv Detail & Related papers (2021-10-25T02:40:33Z) - Bayesian Confidence Calibration for Epistemic Uncertainty Modelling [4.358626952482686]
We introduce a framework to obtain confidence estimates in conjunction with an uncertainty of the calibration method.
We achieve state-of-the-art calibration performance for object detection calibration.
arXiv Detail & Related papers (2021-09-21T10:53:16Z) - Top-label calibration [3.3504365823045044]
We study the problem of post-hoc calibration for multiclass classification, with an emphasis on histogram binning.
We find that the popular notion of confidence calibration is not sufficiently strong -- there exist predictors that are not calibrated in any meaningful way but are perfectly confidence calibrated.
We propose a closely related (but subtly different) notion, top-label calibration, that accurately captures the intuition and simplicity of confidence calibration, but addresses its drawbacks.
arXiv Detail & Related papers (2021-07-18T03:27:50Z) - Uncertainty Quantification and Deep Ensembles [79.4957965474334]
We show that deep-ensembles do not necessarily lead to improved calibration properties.
We show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models.
This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce.
arXiv Detail & Related papers (2020-07-17T07:32:24Z) - Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions.
We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test.
Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z) - Mix-n-Match: Ensemble and Compositional Methods for Uncertainty
Calibration in Deep Learning [21.08664370117846]
We show how Mix-n-Match calibration strategies can help achieve remarkably better data-efficiency and expressive power.
We also reveal potential issues in standard evaluation practices.
Our approaches outperform state-of-the-art solutions on both the calibration as well as the evaluation tasks.
arXiv Detail & Related papers (2020-03-16T17:00:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.