Selective Classification Can Magnify Disparities Across Groups
- URL: http://arxiv.org/abs/2010.14134v3
- Date: Wed, 14 Apr 2021 15:56:59 GMT
- Title: Selective Classification Can Magnify Disparities Across Groups
- Authors: Erik Jones, Shiori Sagawa, Pang Wei Koh, Ananya Kumar, Percy Liang
- Abstract summary: We find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities.
Increasing abstentions can even decrease accuracies on some groups.
We train distributionally-robust models that achieve similar full-coverage accuracies across groups and show that selective classification uniformly improves each group.
- Score: 89.14499988774985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Selective classification, in which models can abstain on uncertain
predictions, is a natural approach to improving accuracy in settings where
errors are costly but abstentions are manageable. In this paper, we find that
while selective classification can improve average accuracies, it can
simultaneously magnify existing accuracy disparities between various groups
within a population, especially in the presence of spurious correlations. We
observe this behavior consistently across five vision and NLP datasets.
Surprisingly, increasing abstentions can even decrease accuracies on some
groups. To better understand this phenomenon, we study the margin distribution,
which captures the model's confidences over all predictions. For symmetric
margin distributions, we prove that whether selective classification
monotonically improves or worsens accuracy is fully determined by the accuracy
at full coverage (i.e., without any abstentions) and whether the distribution
satisfies a property we call left-log-concavity. Our analysis also shows that
selective classification tends to magnify full-coverage accuracy disparities.
Motivated by our analysis, we train distributionally-robust models that achieve
similar full-coverage accuracies across groups and show that selective
classification uniformly improves each group on these models. Altogether, our
results suggest that selective classification should be used with care and
underscore the importance of training models to perform equally well across
groups at full coverage.
Related papers
- How to Fix a Broken Confidence Estimator: Evaluating Post-hoc Methods for Selective Classification with Deep Neural Networks [1.4502611532302039]
We show that a simple $p$-norm normalization of the logits, followed by taking the maximum logit as the confidence estimator, can lead to considerable gains in selective classification performance.
Our results are shown to be consistent under distribution shift.
arXiv Detail & Related papers (2023-05-24T18:56:55Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - On the Richness of Calibration [10.482805367361818]
We make explicit the choices involved in designing calibration scores.
We organise these into three grouping choices and a choice concerning the agglomeration of group errors.
In particular, we explore the possibility of grouping datapoints based on their input features rather than on predictions.
We demonstrate that with appropriate choices of grouping, these novel global fairness scores can provide notions of (sub-)group or individual fairness.
arXiv Detail & Related papers (2023-02-08T15:19:46Z) - Confidence and Dispersity Speak: Characterising Prediction Matrix for
Unsupervised Accuracy Estimation [51.809741427975105]
This work aims to assess how well a model performs under distribution shifts without using labels.
We use the nuclear norm that has been shown to be effective in characterizing both properties.
We show that the nuclear norm is more accurate and robust in accuracy than existing methods.
arXiv Detail & Related papers (2023-02-02T13:30:48Z) - Anomaly Detection using Ensemble Classification and Evidence Theory [62.997667081978825]
We present a novel approach for novel detection using ensemble classification and evidence theory.
A pool selection strategy is presented to build a solid ensemble classifier.
We use uncertainty for the anomaly detection approach.
arXiv Detail & Related papers (2022-12-23T00:50:41Z) - Calibrated Selective Classification [34.08454890436067]
We develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties.
We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model.
We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
arXiv Detail & Related papers (2022-08-25T13:31:09Z) - Selective Ensembles for Consistent Predictions [19.154189897847804]
inconsistency is undesirable in high-stakes contexts.
We show that this inconsistency extends beyond predictions to feature attributions.
We prove that selective ensembles achieve consistent predictions and feature attributions while maintaining low abstention rates.
arXiv Detail & Related papers (2021-11-16T05:03:56Z) - Selective Regression Under Fairness Criteria [30.672082160544996]
In some cases, the performance of minority group can decrease while we reduce the coverage.
We show that such an unwanted behavior can be avoided if we can construct features satisfying the sufficiency criterion.
arXiv Detail & Related papers (2021-10-28T19:05:12Z) - PLM: Partial Label Masking for Imbalanced Multi-label Classification [59.68444804243782]
Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes.
We propose a method, Partial Label Masking (PLM), which utilizes this ratio during training.
Our method achieves strong performance when compared to existing methods on both multi-label (MultiMNIST and MSCOCO) and single-label (imbalanced CIFAR-10 and CIFAR-100) image classification datasets.
arXiv Detail & Related papers (2021-05-22T18:07:56Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.