Controlling Multiple Errors Simultaneously with a PAC-Bayes Bound
- URL: http://arxiv.org/abs/2202.05560v2
- Date: Thu, 22 Feb 2024 16:28:19 GMT
- Title: Controlling Multiple Errors Simultaneously with a PAC-Bayes Bound
- Authors: Reuben Adams and John Shawe-Taylor and Benjamin Guedj
- Abstract summary: We provide the first PAC-Bayes bound capable of providing rich information by bounding the Kullback-Leibler divergence between the empirical and true probabilities of a set of M error types.
Our bound is especially useful in cases where the severity of different mis-classifications may change over time.
- Score: 21.273964864852612
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Current PAC-Bayes generalisation bounds are restricted to scalar metrics of
performance, such as the loss or error rate. However, one ideally wants more
information-rich certificates that control the entire distribution of possible
outcomes, such as the distribution of the test loss in regression, or the
probabilities of different mis classifications. We provide the first PAC-Bayes
bound capable of providing such rich information by bounding the
Kullback-Leibler divergence between the empirical and true probabilities of a
set of M error types, which can either be discretized loss values for
regression, or the elements of the confusion matrix (or a partition thereof)
for classification. We transform our bound into a differentiable training
objective. Our bound is especially useful in cases where the severity of
different mis-classifications may change over time; existing PAC-Bayes bounds
can only bound a particular pre-decided weighting of the error types. In
contrast our bound implicitly controls all uncountably many weightings
simultaneously.
Related papers
- Tighter Generalisation Bounds via Interpolation [16.74864438507713]
We present a recipe for deriving new PAC-Bayes generalisation bounds based on the $(f, Gamma)$-divergence.
We also present PAC-Bayes generalisation bounds where we interpolate between a series of probability divergences.
arXiv Detail & Related papers (2024-02-07T18:55:22Z) - Correcting Underrepresentation and Intersectional Bias for Classification [49.1574468325115]
We consider the problem of learning from data corrupted by underrepresentation bias.
We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates.
We show that our algorithm permits efficient learning for model classes of finite VC dimension.
arXiv Detail & Related papers (2023-06-19T18:25:44Z) - Loss Minimization through the Lens of Outcome Indistinguishability [11.709566373491619]
We present a new perspective on convex loss and the recent notion of Omniprediction.
By design, Loss OI implies omniprediction in a direct and intuitive manner.
We show that Loss OI for the important set of losses arising from Generalized Models, without requiring full multicalibration.
arXiv Detail & Related papers (2022-10-16T22:25:27Z) - PACMAN: PAC-style bounds accounting for the Mismatch between Accuracy
and Negative log-loss [28.166066663983674]
The ultimate performance of machine learning algorithms for classification tasks is usually measured in terms of the empirical error probability (or accuracy) based on a testing dataset.
For classification tasks, this loss function is often the negative log-loss that leads to the well-known cross-entropy risk.
We introduce an analysis based on point-wise PAC approach over the generalization gap considering the mismatch of testing based on the accuracy metric and training on the negative log-loss.
arXiv Detail & Related papers (2021-12-10T14:00:22Z) - Unbiased Loss Functions for Multilabel Classification with Missing
Labels [2.1549398927094874]
Missing labels are a ubiquitous phenomenon in extreme multi-label classification (XMC) tasks.
This paper derives the unique unbiased estimators for the different multilabel reductions.
arXiv Detail & Related papers (2021-09-23T10:39:02Z) - Invariance Principle Meets Information Bottleneck for
Out-of-Distribution Generalization [77.24152933825238]
We show that for linear classification tasks we need stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible.
We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not.
arXiv Detail & Related papers (2021-06-11T20:42:27Z) - Deconfounding Scores: Feature Representations for Causal Effect
Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation.
We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data.
In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z) - A Unified Joint Maximum Mean Discrepancy for Domain Adaptation [73.44809425486767]
This paper theoretically derives a unified form of JMMD that is easy to optimize.
From the revealed unified JMMD, we illustrate that JMMD degrades the feature-label dependence that benefits to classification.
We propose a novel MMD matrix to promote the dependence, and devise a novel label kernel that is robust to label distribution shift.
arXiv Detail & Related papers (2021-01-25T09:46:14Z) - Relative Deviation Margin Bounds [55.22251993239944]
We give two types of learning bounds, both distribution-dependent and valid for general families, in terms of the Rademacher complexity.
We derive distribution-dependent generalization bounds for unbounded loss functions under the assumption of a finite moment.
arXiv Detail & Related papers (2020-06-26T12:37:17Z) - Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions.
We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test.
Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z) - Sample Complexity of Uniform Convergence for Multicalibration [43.10452387619829]
We address the multicalibration error and decouple it from the prediction error.
Our work gives sample complexity bounds for uniform convergence guarantees of multicalibration error.
arXiv Detail & Related papers (2020-05-04T18:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.