Counterfactually Comparing Abstaining Classifiers
- URL: http://arxiv.org/abs/2305.10564v2
- Date: Thu, 9 Nov 2023 06:47:08 GMT
- Title: Counterfactually Comparing Abstaining Classifiers
- Authors: Yo Joong Choe, Aditya Gangrade, Aaditya Ramdas
- Abstract summary: Abstaining classifiers have the option to abstain from making predictions on inputs that they are unsure about.
We introduce a novel approach to evaluating and comparing abstaining classifiers by treating abstentions as missing data.
- Score: 37.43975777164451
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Abstaining classifiers have the option to abstain from making predictions on
inputs that they are unsure about. These classifiers are becoming increasingly
popular in high-stakes decision-making problems, as they can withhold uncertain
predictions to improve their reliability and safety. When evaluating black-box
abstaining classifier(s), however, we lack a principled approach that accounts
for what the classifier would have predicted on its abstentions. These missing
predictions matter when they can eventually be utilized, either directly or as
a backup option in a failure mode. In this paper, we introduce a novel approach
and perspective to the problem of evaluating and comparing abstaining
classifiers by treating abstentions as missing data. Our evaluation approach is
centered around defining the counterfactual score of an abstaining classifier,
defined as the expected performance of the classifier had it not been allowed
to abstain. We specify the conditions under which the counterfactual score is
identifiable: if the abstentions are stochastic, and if the evaluation data is
independent of the training data (ensuring that the predictions are missing at
random), then the score is identifiable. Note that, if abstentions are
deterministic, then the score is unidentifiable because the classifier can
perform arbitrarily poorly on its abstentions. Leveraging tools from
observational causal inference, we then develop nonparametric and doubly robust
methods to efficiently estimate this quantity under identification. Our
approach is examined in both simulated and real data experiments.
Related papers
- Mitigating LLM Hallucinations via Conformal Abstention [70.83870602967625]
We develop a principled procedure for determining when a large language model should abstain from responding in a general domain.
We leverage conformal prediction techniques to develop an abstention procedure that benefits from rigorous theoretical guarantees on the hallucination rate (error rate)
Experimentally, our resulting conformal abstention method reliably bounds the hallucination rate on various closed-book, open-domain generative question answering datasets.
arXiv Detail & Related papers (2024-04-04T11:32:03Z) - Partial-Label Learning with a Reject Option [3.1201323892302444]
We propose a novel partial-label learning algorithm with a reject option, that is, the algorithm can reject unsure predictions.
Our method provides the best trade-off between the number and accuracy of non-rejected predictions when compared to our competitors.
arXiv Detail & Related papers (2024-02-01T13:41:44Z) - When Does Confidence-Based Cascade Deferral Suffice? [69.28314307469381]
Cascades are a classical strategy to enable inference cost to vary adaptively across samples.
A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction.
Despite being oblivious to the structure of the cascade, confidence-based deferral often works remarkably well in practice.
arXiv Detail & Related papers (2023-07-06T04:13:57Z) - Streaming algorithms for evaluating noisy judges on unlabeled data --
binary classification [0.0]
We search for nearly error independent trios by using the algebraic failure modes to reject evaluation ensembles as too correlated.
The results produced by the surviving ensembles can sometimes be as good as 1%.
A Taylor expansion of the estimates produced when independence is assumed but the classifiers are, in fact, slightly correlated helps clarify how the independent evaluator has algebraic blind spots'
arXiv Detail & Related papers (2023-06-02T17:52:59Z) - How to Fix a Broken Confidence Estimator: Evaluating Post-hoc Methods for Selective Classification with Deep Neural Networks [1.4502611532302039]
We show that a simple $p$-norm normalization of the logits, followed by taking the maximum logit as the confidence estimator, can lead to considerable gains in selective classification performance.
Our results are shown to be consistent under distribution shift.
arXiv Detail & Related papers (2023-05-24T18:56:55Z) - Bounding Counterfactuals under Selection Bias [60.55840896782637]
We propose a first algorithm to address both identifiable and unidentifiable queries.
We prove that, in spite of the missingness induced by the selection bias, the likelihood of the available data is unimodal.
arXiv Detail & Related papers (2022-07-26T10:33:10Z) - Taming Adversarial Robustness via Abstaining [7.1975923901054575]
We consider a binary classification problem where the observations can be perturbed by an adversary.
We include an abstaining option, where the classifier abstains from taking a decision when it has low confidence about the prediction.
We show that there exist a tradeoff between the two metrics regardless of what method is used to choose the abstaining region.
arXiv Detail & Related papers (2021-04-06T07:36:48Z) - Classification with abstention but without disparities [5.025654873456756]
We build a general purpose classification algorithm, which is able to abstain from prediction, while avoiding disparate impact.
We establish finite sample risk, fairness, and abstention guarantees for the proposed algorithm.
Our method empirically shows that moderate abstention rates allow to bypass the risk-fairness trade-off.
arXiv Detail & Related papers (2021-02-24T12:43:55Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z) - Selective Classification Can Magnify Disparities Across Groups [89.14499988774985]
We find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities.
Increasing abstentions can even decrease accuracies on some groups.
We train distributionally-robust models that achieve similar full-coverage accuracies across groups and show that selective classification uniformly improves each group.
arXiv Detail & Related papers (2020-10-27T08:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.