On the Calibration of Probabilistic Classifier Sets
- URL: http://arxiv.org/abs/2205.10082v2
- Date: Wed, 19 Apr 2023 11:43:49 GMT
- Title: On the Calibration of Probabilistic Classifier Sets
- Authors: Thomas Mortier and Viktor Bengs and Eyke H\"ullermeier and Stijn Luca
and Willem Waegeman
- Abstract summary: We extend the notion of calibration to evaluate the validity of an aleatoric uncertainty representation.
We show that ensembles of deep neural networks are often not well calibrated.
- Score: 6.759124697337311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-class classification methods that produce sets of probabilistic
classifiers, such as ensemble learning methods, are able to model aleatoric and
epistemic uncertainty. Aleatoric uncertainty is then typically quantified via
the Bayes error, and epistemic uncertainty via the size of the set. In this
paper, we extend the notion of calibration, which is commonly used to evaluate
the validity of the aleatoric uncertainty representation of a single
probabilistic classifier, to assess the validity of an epistemic uncertainty
representation obtained by sets of probabilistic classifiers. Broadly speaking,
we call a set of probabilistic classifiers calibrated if one can find a
calibrated convex combination of these classifiers. To evaluate this notion of
calibration, we propose a novel nonparametric calibration test that generalizes
an existing test for single probabilistic classifiers to the case of sets of
probabilistic classifiers. Making use of this test, we empirically show that
ensembles of deep neural networks are often not well calibrated.
Related papers
- Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - On the Role of Randomization in Adversarially Robust Classification [13.39932522722395]
We show that a randomized ensemble outperforms the hypothesis set in adversarial risk.
We also give an explicit description of the deterministic hypothesis set that contains such a deterministic classifier.
arXiv Detail & Related papers (2023-02-14T17:51:00Z) - Calibration tests beyond classification [30.616624345970973]
Most supervised machine learning tasks are subject to irreducible prediction errors.
Probabilistic predictive models address this limitation by providing probability distributions that represent a belief over plausible targets.
Calibrated models guarantee that the predictions are neither over- nor under-confident.
arXiv Detail & Related papers (2022-10-21T09:49:57Z) - Test-time Recalibration of Conformal Predictors Under Distribution Shift
Based on Unlabeled Examples [30.61588337557343]
Conformal predictors provide uncertainty estimates by computing a set of classes with a user-specified probability.
We propose a method that provides excellent uncertainty estimates under natural distribution shifts.
arXiv Detail & Related papers (2022-10-09T04:46:00Z) - Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm.
Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z) - When in Doubt: Improving Classification Performance with Alternating
Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification.
CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution.
We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions.
We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test.
Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z) - Temporal Probability Calibration [0.0]
We consider calibrating models that produce class probability estimates from sequences of data, focusing on the case where predictions are obtained from incomplete sequences.
We show that traditional calibration techniques are not sufficiently expressive for this task, and propose methods that adapt calibration schemes depending on the length of an input sequence.
arXiv Detail & Related papers (2020-02-07T06:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.