(Almost) Provable Error Bounds Under Distribution Shift via Disagreement
Discrepancy
- URL: http://arxiv.org/abs/2306.00312v1
- Date: Thu, 1 Jun 2023 03:22:15 GMT
- Title: (Almost) Provable Error Bounds Under Distribution Shift via Disagreement
Discrepancy
- Authors: Elan Rosenfeld, Saurabh Garg
- Abstract summary: We derive an (almost) guaranteed upper bound on the error of deep neural networks under distribution shift using unlabeled test data.
In particular, our bound requires a simple, intuitive condition which is well justified by prior empirical works.
We expect this loss can serve as a drop-in replacement for future methods which require maximizing multiclass disagreement.
- Score: 8.010528849585937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We derive an (almost) guaranteed upper bound on the error of deep neural
networks under distribution shift using unlabeled test data. Prior methods
either give bounds that are vacuous in practice or give estimates that are
accurate on average but heavily underestimate error for a sizeable fraction of
shifts. In particular, the latter only give guarantees based on complex
continuous measures such as test calibration -- which cannot be identified
without labels -- and are therefore unreliable. Instead, our bound requires a
simple, intuitive condition which is well justified by prior empirical works
and holds in practice effectively 100% of the time. The bound is inspired by
$\mathcal{H}\Delta\mathcal{H}$-divergence but is easier to evaluate and
substantially tighter, consistently providing non-vacuous guarantees.
Estimating the bound requires optimizing one multiclass classifier to disagree
with another, for which some prior works have used sub-optimal proxy losses; we
devise a "disagreement loss" which is theoretically justified and performs
better in practice. We expect this loss can serve as a drop-in replacement for
future methods which require maximizing multiclass disagreement. Across a wide
range of benchmarks, our method gives valid error bounds while achieving
average accuracy comparable to competitive estimation baselines. Code is
publicly available at https://github.com/erosenfeld/disagree_discrep .
Related papers
- Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers [13.823743787003787]
Recent research has generated hope that inference scaling could allow weaker language models to match or exceed the accuracy of stronger models.
We show that no amount of inference scaling of weaker models can enable them to match the single-sample accuracy of a sufficiently strong model.
We also show that beyond accuracy, false positives may have other undesirable qualities, such as poor adherence to coding style conventions.
arXiv Detail & Related papers (2024-11-26T15:13:06Z) - Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning.
One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems.
We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z) - Dirichlet-Based Prediction Calibration for Learning with Noisy Labels [40.78497779769083]
Learning with noisy labels can significantly hinder the generalization performance of deep neural networks (DNNs)
Existing approaches address this issue through loss correction or example selection methods.
We propose the textitDirichlet-based Prediction (DPC) method as a solution.
arXiv Detail & Related papers (2024-01-13T12:33:04Z) - Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning [59.44422468242455]
We propose a novel method dubbed ShrinkMatch to learn uncertain samples.
For each uncertain sample, it adaptively seeks a shrunk class space, which merely contains the original top-1 class.
We then impose a consistency regularization between a pair of strongly and weakly augmented samples in the shrunk space to strive for discriminative representations.
arXiv Detail & Related papers (2023-08-13T14:05:24Z) - Distribution-Free Inference for the Regression Function of Binary
Classification [0.0]
The paper presents a resampling framework to construct exact, distribution-free and non-asymptotically guaranteed confidence regions for the true regression function for any user-chosen confidence level.
It is proved that the constructed confidence regions are strongly consistent, that is, any false model is excluded in the long run with probability one.
arXiv Detail & Related papers (2023-08-03T15:52:27Z) - Is the Performance of My Deep Network Too Good to Be True? A Direct
Approach to Estimating the Bayes Error in Binary Classification [86.32752788233913]
In classification problems, the Bayes error can be used as a criterion to evaluate classifiers with state-of-the-art performance.
We propose a simple and direct Bayes error estimator, where we just take the mean of the labels that show emphuncertainty of the classes.
Our flexible approach enables us to perform Bayes error estimation even for weakly supervised data.
arXiv Detail & Related papers (2022-02-01T13:22:26Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Shift Happens: Adjusting Classifiers [2.8682942808330703]
Minimizing expected loss measured by a proper scoring rule, such as Brier score or log-loss (cross-entropy), is a common objective while training a probabilistic classifier.
We propose methods that transform all predictions to (re)equalize the average prediction and the class distribution.
We demonstrate experimentally that, when in practice the class distribution is known only approximately, there is often still a reduction in loss depending on the amount of shift and the precision to which the class distribution is known.
arXiv Detail & Related papers (2021-11-03T21:27:27Z) - Tune it the Right Way: Unsupervised Validation of Domain Adaptation via
Soft Neighborhood Density [125.64297244986552]
We propose an unsupervised validation criterion that measures the density of soft neighborhoods by computing the entropy of the similarity distribution between points.
Our criterion is simpler than competing validation methods, yet more effective.
arXiv Detail & Related papers (2021-08-24T17:41:45Z) - Cross-validation Confidence Intervals for Test Error [83.67415139421448]
This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm.
Results are the first of their kind for the popular choice of leave-one-out cross-validation.
arXiv Detail & Related papers (2020-07-24T17:40:06Z) - Knowing what you know: valid and validated confidence sets in multiclass
and multilabel prediction [0.8594140167290097]
We develop conformal prediction methods for constructing valid confidence sets in multiclass and multilabel problems.
By leveraging ideas from quantile regression, we build methods that always guarantee correct coverage but additionally provide conditional coverage for both multiclass and multilabel prediction problems.
arXiv Detail & Related papers (2020-04-21T17:45:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.