Related papers: Controlling Multiple Errors Simultaneously with a PAC-Bayes Bound

Controlling Multiple Errors Simultaneously with a PAC-Bayes Bound

URL: http://arxiv.org/abs/2202.05560v2
Date: Thu, 22 Feb 2024 16:28:19 GMT
Title: Controlling Multiple Errors Simultaneously with a PAC-Bayes Bound
Authors: Reuben Adams and John Shawe-Taylor and Benjamin Guedj
Abstract summary: We provide the first PAC-Bayes bound capable of providing rich information by bounding the Kullback-Leibler divergence between the empirical and true probabilities of a set of M error types. Our bound is especially useful in cases where the severity of different mis-classifications may change over time.
Score: 21.273964864852612
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Current PAC-Bayes generalisation bounds are restricted to scalar metrics of performance, such as the loss or error rate. However, one ideally wants more information-rich certificates that control the entire distribution of possible outcomes, such as the distribution of the test loss in regression, or the probabilities of different mis classifications. We provide the first PAC-Bayes bound capable of providing such rich information by bounding the Kullback-Leibler divergence between the empirical and true probabilities of a set of M error types, which can either be discretized loss values for regression, or the elements of the confusion matrix (or a partition thereof) for classification. We transform our bound into a differentiable training objective. Our bound is especially useful in cases where the severity of different mis-classifications may change over time; existing PAC-Bayes bounds can only bound a particular pre-decided weighting of the error types. In contrast our bound implicitly controls all uncountably many weightings simultaneously.

Related papers

Decision from Suboptimal Classifiers: Excess Risk Pre- and Post-Calibration [52.70324949884702]
We quantify the excess risk incurred using approximate posterior probabilities in batch binary decision-making. We identify regimes where recalibration alone addresses most of the regret, and regimes where the regret is dominated by the grouping loss. On NLP experiments, we show that these quantities identify when the expected gain of more advanced post-training is worth the operational cost.
arXiv Detail & Related papers (2025-03-23T10:52:36Z)
Of Dice and Games: A Theory of Generalized Boosting [61.752303337418475]
We extend the celebrated theory of boosting to incorporate both cost-sensitive and multi-objective losses. We develop a comprehensive theory of cost-sensitive and multi-objective boosting, providing a taxonomy of weak learning guarantees. Our characterization relies on a geometric interpretation of boosting, revealing a surprising equivalence between cost-sensitive and multi-objective losses.
arXiv Detail & Related papers (2024-12-11T01:38:32Z)
Semiparametric conformal prediction [79.6147286161434]
We construct a conformal prediction set accounting for the joint correlation structure of the vector-valued non-conformity scores. We flexibly estimate the joint cumulative distribution function (CDF) of the scores. Our method yields desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z)
Orthogonal Causal Calibration [55.28164682911196]
We develop general algorithms for reducing the task of causal calibration to that of calibrating a standard (non-causal) predictive model. Our results are exceedingly general, showing that essentially any existing calibration algorithm can be used in causal settings.
arXiv Detail & Related papers (2024-06-04T03:35:25Z)
Tighter Generalisation Bounds via Interpolation [16.74864438507713]
We present a recipe for deriving new PAC-Bayes generalisation bounds based on the $(f, Gamma)$-divergence. We also present PAC-Bayes generalisation bounds where we interpolate between a series of probability divergences.
arXiv Detail & Related papers (2024-02-07T18:55:22Z)
PAC-Bayes-Chernoff bounds for unbounded losses [9.987130158432755]
We introduce a new PAC-Bayes oracle bound for unbounded losses that extends Cram'er-Chernoff bounds to the PAC-Bayesian setting. Our approach naturally leverages properties of Cram'er-Chernoff bounds, such as exact optimization of the free parameter in many PAC-Bayes bounds.
arXiv Detail & Related papers (2024-01-02T10:58:54Z)
Correcting Underrepresentation and Intersectional Bias for Classification [49.1574468325115]
We consider the problem of learning from data corrupted by underrepresentation bias. We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates. We show that our algorithm permits efficient learning for model classes of finite VC dimension.
arXiv Detail & Related papers (2023-06-19T18:25:44Z)
Loss Minimization through the Lens of Outcome Indistinguishability [11.709566373491619]
We present a new perspective on convex loss and the recent notion of Omniprediction. By design, Loss OI implies omniprediction in a direct and intuitive manner. We show that Loss OI for the important set of losses arising from Generalized Models, without requiring full multicalibration.
arXiv Detail & Related papers (2022-10-16T22:25:27Z)
PACMAN: PAC-style bounds accounting for the Mismatch between Accuracy and Negative log-loss [28.166066663983674]
The ultimate performance of machine learning algorithms for classification tasks is usually measured in terms of the empirical error probability (or accuracy) based on a testing dataset. For classification tasks, this loss function is often the negative log-loss that leads to the well-known cross-entropy risk. We introduce an analysis based on point-wise PAC approach over the generalization gap considering the mismatch of testing based on the accuracy metric and training on the negative log-loss.
arXiv Detail & Related papers (2021-12-10T14:00:22Z)
Unbiased Loss Functions for Multilabel Classification with Missing Labels [2.1549398927094874]
Missing labels are a ubiquitous phenomenon in extreme multi-label classification (XMC) tasks. This paper derives the unique unbiased estimators for the different multilabel reductions.
arXiv Detail & Related papers (2021-09-23T10:39:02Z)
Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization [77.24152933825238]
We show that for linear classification tasks we need stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible. We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not.
arXiv Detail & Related papers (2021-06-11T20:42:27Z)
Deconfounding Scores: Feature Representations for Causal Effect Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation. We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data. In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z)
A Unified Joint Maximum Mean Discrepancy for Domain Adaptation [73.44809425486767]
This paper theoretically derives a unified form of JMMD that is easy to optimize. From the revealed unified JMMD, we illustrate that JMMD degrades the feature-label dependence that benefits to classification. We propose a novel MMD matrix to promote the dependence, and devise a novel label kernel that is robust to label distribution shift.
arXiv Detail & Related papers (2021-01-25T09:46:14Z)
Relative Deviation Margin Bounds [55.22251993239944]
We give two types of learning bounds, both distribution-dependent and valid for general families, in terms of the Rademacher complexity. We derive distribution-dependent generalization bounds for unbounded loss functions under the assumption of a finite moment.
arXiv Detail & Related papers (2020-06-26T12:37:17Z)
Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions. We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test. Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z)
Sample Complexity of Uniform Convergence for Multicalibration [43.10452387619829]
We address the multicalibration error and decouple it from the prediction error. Our work gives sample complexity bounds for uniform convergence guarantees of multicalibration error.
arXiv Detail & Related papers (2020-05-04T18:01:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.