Top-$k$ Classification and Cardinality-Aware Prediction
- URL: http://arxiv.org/abs/2403.19625v1
- Date: Thu, 28 Mar 2024 17:45:03 GMT
- Title: Top-$k$ Classification and Cardinality-Aware Prediction
- Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong,
- Abstract summary: We show that comp-sum and constrained losses are supported by $H$-consistency bounds with respect to the top-$k$ loss.
We introduce cardinality-aware loss functions through instance-dependent cost-sensitive learning.
Minimizing these losses leads to new cardinality-aware algorithms for top-$k$ classification.
- Score: 30.389055604165222
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a detailed study of top-$k$ classification, the task of predicting the $k$ most probable classes for an input, extending beyond single-class prediction. We demonstrate that several prevalent surrogate loss functions in multi-class classification, such as comp-sum and constrained losses, are supported by $H$-consistency bounds with respect to the top-$k$ loss. These bounds guarantee consistency in relation to the hypothesis set $H$, providing stronger guarantees than Bayes-consistency due to their non-asymptotic and hypothesis-set specific nature. To address the trade-off between accuracy and cardinality $k$, we further introduce cardinality-aware loss functions through instance-dependent cost-sensitive learning. For these functions, we derive cost-sensitive comp-sum and constrained surrogate losses, establishing their $H$-consistency bounds and Bayes-consistency. Minimizing these losses leads to new cardinality-aware algorithms for top-$k$ classification. We report the results of extensive experiments on CIFAR-100, ImageNet, CIFAR-10, and SVHN datasets demonstrating the effectiveness and benefit of these algorithms.
Related papers
- Realizable $H$-Consistent and Bayes-Consistent Loss Functions for Learning to Defer [30.389055604165222]
We introduce a broad family of surrogate losses, parameterized by a non-increasing function $Psi$, and establish their realizable $H$-consistency under mild conditions.
For cost functions based on classification error, we show that these losses admit $H$-consistency bounds when the hypothesis set is symmetric and complete.
arXiv Detail & Related papers (2024-07-18T17:35:03Z) - Cardinality-Aware Set Prediction and Top-$k$ Classification [33.00927499966789]
We present a novel approach that aims to learn an accurate top-$k$ set predictor while maintaining a low cardinality.
We introduce a new target loss function tailored to this setting that accounts for both the classification error and the cardinality of the set predicted.
Minimizing these loss functions leads to new cardinality-aware algorithms that we describe in detail.
arXiv Detail & Related papers (2024-07-09T17:57:07Z) - Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation.
We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z) - Predictor-Rejector Multi-Class Abstention: Theoretical Analysis and Algorithms [30.389055604165222]
We study the key framework of learning with abstention in the multi-class classification setting.
In this setting, the learner can choose to abstain from making a prediction with some pre-defined cost.
We introduce several new families of surrogate losses for which we prove strong non-asymptotic and hypothesis set-specific consistency guarantees.
arXiv Detail & Related papers (2023-10-23T10:16:27Z) - Theoretically Grounded Loss Functions and Algorithms for Score-Based Multi-Class Abstention [30.389055604165222]
We introduce new families of surrogate losses for the abstention loss function.
We prove strong non-asymptotic and hypothesis set-specific consistency guarantees for these surrogate losses.
Our results show that the relative performance of the state-of-the-art score-based surrogate losses can vary across datasets.
arXiv Detail & Related papers (2023-10-23T10:13:35Z) - Characterizing the Optimal 0-1 Loss for Multi-class Classification with
a Test-time Attacker [57.49330031751386]
We find achievable information-theoretic lower bounds on loss in the presence of a test-time attacker for multi-class classifiers on any discrete dataset.
We provide a general framework for finding the optimal 0-1 loss that revolves around the construction of a conflict hypergraph from the data and adversarial constraints.
arXiv Detail & Related papers (2023-02-21T15:17:13Z) - Label Distributionally Robust Losses for Multi-class Classification:
Consistency, Robustness and Adaptivity [55.29408396918968]
We study a family of loss functions named label-distributionally robust (LDR) losses for multi-class classification.
Our contributions include both consistency and robustness by establishing top-$k$ consistency of LDR losses for multi-class classification.
We propose a new adaptive LDR loss that automatically adapts the individualized temperature parameter to the noise degree of class label of each instance.
arXiv Detail & Related papers (2021-12-30T00:27:30Z) - Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems.
In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - Multi-group Agnostic PAC Learnability [7.9649015115693444]
We study "multi-group agnostic PAC learnability"
We provide a characterization of the loss functions for which such a predictor is guaranteed to exist.
Our results unify and extend previous positive and negative results from the multi-group fairness literature.
arXiv Detail & Related papers (2021-05-20T18:43:36Z) - Provable tradeoffs in adversarially robust classification [96.48180210364893]
We develop and leverage new tools, including recent breakthroughs from probability theory on robust isoperimetry.
Our results reveal fundamental tradeoffs between standard and robust accuracy that grow when data is imbalanced.
arXiv Detail & Related papers (2020-06-09T09:58:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.