Classification Under Ambiguity: When Is Average-K Better Than Top-K?
- URL: http://arxiv.org/abs/2112.08851v1
- Date: Thu, 16 Dec 2021 12:58:07 GMT
- Title: Classification Under Ambiguity: When Is Average-K Better Than Top-K?
- Authors: Titouan Lorieul, Alexis Joly and Dennis Shasha
- Abstract summary: A common alternative, referred to as top-$K$ classification, is to choose some number $K$ and to return the $K$ labels with the highest scores.
This paper formally characterizes the ambiguity profile when average-$K$ classification can achieve a lower error rate than a fixed top-$K$ classification.
- Score: 1.7156052308952854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When many labels are possible, choosing a single one can lead to low
precision. A common alternative, referred to as top-$K$ classification, is to
choose some number $K$ (commonly around 5) and to return the $K$ labels with
the highest scores. Unfortunately, for unambiguous cases, $K>1$ is too many
and, for very ambiguous cases, $K \leq 5$ (for example) can be too small. An
alternative sensible strategy is to use an adaptive approach in which the
number of labels returned varies as a function of the computed ambiguity, but
must average to some particular $K$ over all the samples. We denote this
alternative average-$K$ classification. This paper formally characterizes the
ambiguity profile when average-$K$ classification can achieve a lower error
rate than a fixed top-$K$ classification. Moreover, it provides natural
estimation procedures for both the fixed-size and the adaptive classifier and
proves their consistency. Finally, it reports experiments on real-world image
data sets revealing the benefit of average-$K$ classification over top-$K$ in
practice. Overall, when the ambiguity is known precisely, average-$K$ is never
worse than top-$K$, and, in our experiments, when it is estimated, this also
holds.
Related papers
- Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation.
We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z) - Naive Bayes Classifiers and One-hot Encoding of Categorical Variables [4.5053219193867395]
We investigate the consequences of encoding a $K$-valued categorical variable incorrectly as $K$ bits via one-hot encoding.
This gives rise to a product-of-Bernoullis (PoB) assumption, rather than the correct categorical Na"ive Bayes classifier.
arXiv Detail & Related papers (2024-04-28T14:04:58Z) - One-Bit Quantization and Sparsification for Multiclass Linear Classification with Strong Regularization [18.427215139020625]
We show that the best classification is achieved when $f(cdot) = |cdot|2$ and $lambda to infty$.
It is often possible to find sparse and one-bit solutions that perform almost as well as one corresponding to $f(cdot) = |cdot|_infty$ in the large $lambda$ regime.
arXiv Detail & Related papers (2024-02-16T06:39:40Z) - Generating Unbiased Pseudo-labels via a Theoretically Guaranteed
Chebyshev Constraint to Unify Semi-supervised Classification and Regression [57.17120203327993]
threshold-to-pseudo label process (T2L) in classification uses confidence to determine the quality of label.
In nature, regression also requires unbiased methods to generate high-quality labels.
We propose a theoretically guaranteed constraint for generating unbiased labels based on Chebyshev's inequality.
arXiv Detail & Related papers (2023-11-03T08:39:35Z) - Repeated Observations for Classification [0.2676349883103404]
We study the problem nonparametric classification with repeated observations.
In the analysis, we investigate particular models like robust detection by nominal densities, prototype classification, linear transformation, linear classification, scaling.
arXiv Detail & Related papers (2023-07-19T10:50:36Z) - Optimizing Partial Area Under the Top-k Curve: Theory and Practice [151.5072746015253]
We develop a novel metric named partial Area Under the top-k Curve (AUTKC)
AUTKC has a better discrimination ability, and its Bayes optimal score function could give a correct top-K ranking with respect to the conditional probability.
We present an empirical surrogate risk minimization framework to optimize the proposed metric.
arXiv Detail & Related papers (2022-09-03T11:09:13Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Prevalence Threshold and bounds in the Accuracy of Binary Classification
Systems [0.0]
We show that relative to a perfect accuracy of 1, the positive prevalence threshold ($phi_e$) is a critical point of maximum curvature in the precision-prevalence curve.
Though applications are numerous, the ideas herein discussed may be used in computational complexity theory, artificial intelligence, and medical screening.
arXiv Detail & Related papers (2021-12-25T21:22:32Z) - Almost Tight L0-norm Certified Robustness of Top-k Predictions against
Adversarial Perturbations [78.23408201652984]
Top-k predictions are used in many real-world applications such as machine learning as a service, recommender systems, and web searches.
Our work is based on randomized smoothing, which builds a provably robust classifier via randomizing an input.
For instance, our method can build a classifier that achieves a certified top-3 accuracy of 69.2% on ImageNet when an attacker can arbitrarily perturb 5 pixels of a testing image.
arXiv Detail & Related papers (2020-11-15T21:34:44Z) - Binary classification with ambiguous training data [69.50862982117127]
In supervised learning, we often face with ambiguous (A) samples that are difficult to label even by domain experts.
This problem is substantially different from semi-supervised learning since unlabeled samples are not necessarily difficult samples.
arXiv Detail & Related papers (2020-11-05T00:53:58Z) - A Multiclass Classification Approach to Label Ranking [2.6905021039717987]
In multiclass classification, the goal is to learn how to predict a random label $Y$, valued in $mathcalY=1,; ldots,; K $ with $Kgeq 3$.
This article is devoted to the analysis of this statistical learning problem, halfway between multiclass classification and posterior probability estimation.
arXiv Detail & Related papers (2020-02-21T17:12:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.