Estimating Multi-label Accuracy using Labelset Distributions
- URL: http://arxiv.org/abs/2209.04163v1
- Date: Fri, 9 Sep 2022 07:47:35 GMT
- Title: Estimating Multi-label Accuracy using Labelset Distributions
- Authors: Laurence A. F. Park, Jesse Read
- Abstract summary: A multi-label classifier estimates the binary label state for each of a set of concept labels, for any given instance.
We show that the expected accuracy can be estimated from the multi-label predictive distribution.
- Score: 1.5076964620370268
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A multi-label classifier estimates the binary label state (relevant vs
irrelevant) for each of a set of concept labels, for any given instance.
Probabilistic multi-label classifiers provide a predictive posterior
distribution over all possible labelset combinations of such label states (the
powerset of labels) from which we can provide the best estimate, simply by
selecting the labelset corresponding to the largest expected accuracy, over
that distribution. For example, in maximizing exact match accuracy, we provide
the mode of the distribution. But how does this relate to the confidence we may
have in such an estimate? Confidence is an important element of real-world
applications of multi-label classifiers (as in machine learning in general) and
is an important ingredient in explainability and interpretability. However, it
is not obvious how to provide confidence in the multi-label context and
relating to a particular accuracy metric, and nor is it clear how to provide a
confidence which correlates well with the expected accuracy, which would be
most valuable in real-world decision making. In this article we estimate the
expected accuracy as a surrogate for confidence, for a given accuracy metric.
We hypothesise that the expected accuracy can be estimated from the multi-label
predictive distribution. We examine seven candidate functions for their ability
to estimate expected accuracy from the predictive distribution. We found three
of these to correlate to expected accuracy and are robust. Further, we
determined that each candidate function can be used separately to estimate
Hamming similarity, but a combination of the candidates was best for expected
Jaccard index and exact match.
Related papers
- Predicting generalization performance with correctness discriminators [64.00420578048855]
We present a novel model that establishes upper and lower bounds on the accuracy, without requiring gold labels for the unseen data.
We show across a variety of tagging, parsing, and semantic parsing tasks that the gold accuracy is reliably between the predicted upper and lower bounds.
arXiv Detail & Related papers (2023-11-15T22:43:42Z) - Leveraging Ensemble Diversity for Robust Self-Training in the Presence of Sample Selection Bias [5.698050337128548]
Self-training is a well-known approach for semi-supervised learning. It consists of iteratively assigning pseudo-labels to unlabeled data for which the model is confident and treating them as labeled examples.
For neural networks, softmax prediction probabilities are often used as a confidence measure, although they are known to be overconfident, even for wrong predictions.
We propose a novel confidence measure, called $mathcalT$-similarity, built upon the prediction diversity of an ensemble of linear classifiers.
arXiv Detail & Related papers (2023-10-23T11:30:06Z) - PAC Prediction Sets Under Label Shift [52.30074177997787]
Prediction sets capture uncertainty by predicting sets of labels rather than individual labels.
We propose a novel algorithm for constructing prediction sets with PAC guarantees in the label shift setting.
We evaluate our approach on five datasets.
arXiv Detail & Related papers (2023-10-19T17:57:57Z) - Confidence and Dispersity Speak: Characterising Prediction Matrix for
Unsupervised Accuracy Estimation [51.809741427975105]
This work aims to assess how well a model performs under distribution shifts without using labels.
We use the nuclear norm that has been shown to be effective in characterizing both properties.
We show that the nuclear norm is more accurate and robust in accuracy than existing methods.
arXiv Detail & Related papers (2023-02-02T13:30:48Z) - From Classification Accuracy to Proper Scoring Rules: Elicitability of
Probabilistic Top List Predictions [0.0]
I propose a novel type of prediction in classification, which bridges the gap between single-class predictions and predictive distributions.
The proposed evaluation metrics are based on symmetric proper scoring rules and admit comparison of various types of predictions.
arXiv Detail & Related papers (2023-01-27T15:55:01Z) - Test-time Recalibration of Conformal Predictors Under Distribution Shift
Based on Unlabeled Examples [30.61588337557343]
Conformal predictors provide uncertainty estimates by computing a set of classes with a user-specified probability.
We propose a method that provides excellent uncertainty estimates under natural distribution shifts.
arXiv Detail & Related papers (2022-10-09T04:46:00Z) - Distribution-free uncertainty quantification for classification under
label shift [105.27463615756733]
We focus on uncertainty quantification (UQ) for classification problems via two avenues.
We first argue that label shift hurts UQ, by showing degradation in coverage and calibration.
We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance.
arXiv Detail & Related papers (2021-03-04T20:51:03Z) - Selective Classification Can Magnify Disparities Across Groups [89.14499988774985]
We find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities.
Increasing abstentions can even decrease accuracies on some groups.
We train distributionally-robust models that achieve similar full-coverage accuracies across groups and show that selective classification uniformly improves each group.
arXiv Detail & Related papers (2020-10-27T08:51:30Z) - Class-Similarity Based Label Smoothing for Confidence Calibration [2.055949720959582]
We propose a novel form of label smoothing to improve confidence calibration.
Since different classes are of different intrinsic similarities, more similar classes should result in closer probability values in the final output.
This motivates the development of a new smooth label where the label values are based on similarities with the reference class.
arXiv Detail & Related papers (2020-06-24T20:26:22Z) - Distribution-free binary classification: prediction sets, confidence
intervals and calibration [106.50279469344937]
We study three notions of uncertainty quantification -- calibration, confidence intervals and prediction sets -- for binary classification in the distribution-free setting.
We derive confidence intervals for binned probabilities for both fixed-width and uniform-mass binning.
As a consequence of our 'tripod' theorems, these confidence intervals for binned probabilities lead to distribution-free calibration.
arXiv Detail & Related papers (2020-06-18T14:17:29Z) - Knowing what you know: valid and validated confidence sets in multiclass
and multilabel prediction [0.8594140167290097]
We develop conformal prediction methods for constructing valid confidence sets in multiclass and multilabel problems.
By leveraging ideas from quantile regression, we build methods that always guarantee correct coverage but additionally provide conditional coverage for both multiclass and multilabel prediction problems.
arXiv Detail & Related papers (2020-04-21T17:45:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.