Classifier uncertainty: evidence, potential impact, and probabilistic
treatment
- URL: http://arxiv.org/abs/2006.11105v1
- Date: Fri, 19 Jun 2020 12:49:19 GMT
- Title: Classifier uncertainty: evidence, potential impact, and probabilistic
treatment
- Authors: Niklas T\"otsch, Daniel Hoffmann
- Abstract summary: We present an approach to quantify the uncertainty of classification performance metrics based on a probability model of the confusion matrix.
We show that uncertainties can be surprisingly large and limit performance evaluation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classifiers are often tested on relatively small data sets, which should lead
to uncertain performance metrics. Nevertheless, these metrics are usually taken
at face value. We present an approach to quantify the uncertainty of
classification performance metrics, based on a probability model of the
confusion matrix. Application of our approach to classifiers from the
scientific literature and a classification competition shows that uncertainties
can be surprisingly large and limit performance evaluation. In fact, some
published classifiers are likely to be misleading. The application of our
approach is simple and requires only the confusion matrix. It is agnostic of
the underlying classifier. Our method can also be used for the estimation of
sample sizes that achieve a desired precision of a performance metric.
Related papers
- Trustworthy Classification through Rank-Based Conformal Prediction Sets [9.559062601251464]
We propose a novel conformal prediction method that employs a rank-based score function suitable for classification models.
Our approach constructs prediction sets that achieve the desired coverage rate while managing their size.
Our contributions include a novel conformal prediction method, theoretical analysis, and empirical evaluation.
arXiv Detail & Related papers (2024-07-05T10:43:41Z) - $F_β$-plot -- a visual tool for evaluating imbalanced data classifiers [0.0]
The paper proposes a simple approach to analyzing the popular parametric metric $F_beta$.
It is possible to indicate for a given pool of analyzed classifiers when a given model should be preferred depending on user requirements.
arXiv Detail & Related papers (2024-04-11T18:07:57Z) - Class-Conditional Conformal Prediction with Many Classes [60.8189977620604]
We propose a method called clustered conformal prediction that clusters together classes having "similar" conformal scores.
We find that clustered conformal typically outperforms existing methods in terms of class-conditional coverage and set size metrics.
arXiv Detail & Related papers (2023-06-15T17:59:02Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Fairness and Unfairness in Binary and Multiclass Classification: Quantifying, Calculating, and Bounding [22.449347663780767]
We propose a new interpretable measure of unfairness, that allows providing a quantitative analysis of classifier fairness.
We show how this measure can be calculated when the classifier's conditional confusion matrices are known.
We report experiments on data sets representing diverse applications.
arXiv Detail & Related papers (2022-06-07T12:26:28Z) - Determination of class-specific variables in nonparametric
multiple-class classification [0.0]
We propose a probability-based nonparametric multiple-class classification method, and integrate it with the ability of identifying high impact variables for individual class.
We report the properties of the proposed method, and use both synthesized and real data sets to illustrate its properties under different classification situations.
arXiv Detail & Related papers (2022-05-07T10:08:58Z) - Is the Performance of My Deep Network Too Good to Be True? A Direct
Approach to Estimating the Bayes Error in Binary Classification [86.32752788233913]
In classification problems, the Bayes error can be used as a criterion to evaluate classifiers with state-of-the-art performance.
We propose a simple and direct Bayes error estimator, where we just take the mean of the labels that show emphuncertainty of the classes.
Our flexible approach enables us to perform Bayes error estimation even for weakly supervised data.
arXiv Detail & Related papers (2022-02-01T13:22:26Z) - Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm.
Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z) - When in Doubt: Improving Classification Performance with Alternating
Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification.
CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution.
We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z) - Fairness Measures for Regression via Probabilistic Classification [0.0]
Algorithmic fairness involves expressing notions such as equity, or reasonable treatment, as quantifiable measures that a machine learning algorithm can optimise.
This is in part because classification fairness measures are easily computed by comparing the rates of outcomes, leading to behaviours such as ensuring the same fraction of eligible men are selected as eligible women.
But such measures are computationally difficult to generalise to the continuous regression setting for problems such as pricing, or allocating payments.
For the regression setting we introduce tractable approximations of the independence, separation and sufficiency criteria by observing that they factorise as ratios of different conditional probabilities of the protected attributes.
arXiv Detail & Related papers (2020-01-16T21:53:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.