Minimax risk classifiers with 0-1 loss
- URL: http://arxiv.org/abs/2201.06487v6
- Date: Thu, 17 Aug 2023 00:58:24 GMT
- Title: Minimax risk classifiers with 0-1 loss
- Authors: Santiago Mazuelas and Mauricio Romero and Peter Gr\"unwald
- Abstract summary: This paper presents minimax risk classifiers (MRCs) that minize the worst-case 0-1 loss with respect to uncertainty sets of distributions.
We show that MRCs can provide tight performance guarantees at learning and are strongly universally consistent using feature mappings given by characteristic kernels.
The paper also proposes efficient optimization techniques for MRC learning and shows that the methods presented can provide accurate classification together with tight performance guarantees in practice.
- Score: 7.650319416775203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supervised classification techniques use training samples to learn a
classification rule with small expected 0-1 loss (error probability).
Conventional methods enable tractable learning and provide out-of-sample
generalization by using surrogate losses instead of the 0-1 loss and
considering specific families of rules (hypothesis classes). This paper
presents minimax risk classifiers (MRCs) that minize the worst-case 0-1 loss
with respect to uncertainty sets of distributions that can include the
underlying distribution, with a tunable confidence. We show that MRCs can
provide tight performance guarantees at learning and are strongly universally
consistent using feature mappings given by characteristic kernels. The paper
also proposes efficient optimization techniques for MRC learning and shows that
the methods presented can provide accurate classification together with tight
performance guarantees in practice.
Related papers
- Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning [59.44422468242455]
We propose a novel method dubbed ShrinkMatch to learn uncertain samples.
For each uncertain sample, it adaptively seeks a shrunk class space, which merely contains the original top-1 class.
We then impose a consistency regularization between a pair of strongly and weakly augmented samples in the shrunk space to strive for discriminative representations.
arXiv Detail & Related papers (2023-08-13T14:05:24Z) - A Generalized Unbiased Risk Estimator for Learning with Augmented
Classes [70.20752731393938]
Given unlabeled data, an unbiased risk estimator (URE) can be derived, which can be minimized for LAC with theoretical guarantees.
We propose a generalized URE that can be equipped with arbitrary loss functions while maintaining the theoretical guarantees.
arXiv Detail & Related papers (2023-06-12T06:52:04Z) - Shift Happens: Adjusting Classifiers [2.8682942808330703]
Minimizing expected loss measured by a proper scoring rule, such as Brier score or log-loss (cross-entropy), is a common objective while training a probabilistic classifier.
We propose methods that transform all predictions to (re)equalize the average prediction and the class distribution.
We demonstrate experimentally that, when in practice the class distribution is known only approximately, there is often still a reduction in loss depending on the amount of shift and the precision to which the class distribution is known.
arXiv Detail & Related papers (2021-11-03T21:27:27Z) - Constrained Classification and Policy Learning [0.0]
We study consistency of surrogate loss procedures under a constrained set of classifiers.
We show that hinge losses are the only surrogate losses that preserve consistency in second-best scenarios.
arXiv Detail & Related papers (2021-06-24T10:43:00Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Scaling Ensemble Distribution Distillation to Many Classes with Proxy
Targets [12.461503242570643]
emphEnsemble Distribution Distillation is an approach that allows a single model to efficiently capture both the predictive performance and uncertainty estimates of an ensemble.
For classification, this is achieved by training a Dirichlet distribution over the ensemble members' output distributions via the maximum likelihood criterion.
Although theoretically, this criterion exhibits poor convergence when applied to large-scale tasks where the number of classes is very high.
arXiv Detail & Related papers (2021-05-14T17:50:14Z) - RATT: Leveraging Unlabeled Data to Guarantee Generalization [96.08979093738024]
We introduce a method that leverages unlabeled data to produce generalization bounds.
We prove that our bound is valid for 0-1 empirical risk minimization.
This work provides practitioners with an option for certifying the generalization of deep nets even when unseen labeled data is unavailable.
arXiv Detail & Related papers (2021-05-01T17:05:29Z) - Learning with risks based on M-location [6.903929927172917]
We study a new class of risks defined in terms of the location and deviation of the loss distribution.
The class is easily implemented as a wrapper around any smooth loss.
arXiv Detail & Related papers (2020-12-04T06:21:51Z) - Minimax Classification with 0-1 Loss and Performance Guarantees [4.812718493682455]
Supervised classification techniques use training samples to find classification rules with small expected 0-1 loss.
Conventional methods achieve efficient learning and out-of-sample generalization by minimizing surrogate losses over specific families of rules.
This paper presents minimax risk classifiers (MRCs) that do not rely on a choice of surrogate loss and family of rules.
arXiv Detail & Related papers (2020-10-15T18:11:28Z) - Selective Classification via One-Sided Prediction [54.05407231648068]
One-sided prediction (OSP) based relaxation yields an SC scheme that attains near-optimal coverage in the practically relevant high target accuracy regime.
We theoretically derive bounds generalization for SC and OSP, and empirically we show that our scheme strongly outperforms state of the art methods in coverage at small error levels.
arXiv Detail & Related papers (2020-10-15T16:14:27Z) - Calibrated Surrogate Losses for Adversarially Robust Classification [92.37268323142307]
We show that no convex surrogate loss is respect with respect to adversarial 0-1 loss when restricted to linear models.
We also show that if the underlying distribution satisfies the Massart's noise condition, convex losses can also be calibrated in the adversarial setting.
arXiv Detail & Related papers (2020-05-28T02:40:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.