Equity-Directed Bootstrapping: Examples and Analysis
- URL: http://arxiv.org/abs/2108.06624v1
- Date: Sat, 14 Aug 2021 22:09:27 GMT
- Title: Equity-Directed Bootstrapping: Examples and Analysis
- Authors: Harish S. Bhat and Majerle E. Reeves and Sidra Goldman-Mellor
- Abstract summary: We show how an equity-directed bootstrap can bring test set sensitivities and specificities closer to satisfying the equal odds criterion.
In the context of na"ive Bayes and logistic regression, we analyze the equity-directed bootstrap, demonstrating that it works by bringing odds ratios close to one.
- Score: 3.007949058551534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When faced with severely imbalanced binary classification problems, we often
train models on bootstrapped data in which the number of instances of each
class occur in a more favorable ratio, e.g., one. We view algorithmic inequity
through the lens of imbalanced classification: in order to balance the
performance of a classifier across groups, we can bootstrap to achieve training
sets that are balanced with respect to both labels and group identity. For an
example problem with severe class imbalance---prediction of suicide death from
administrative patient records---we illustrate how an equity-directed bootstrap
can bring test set sensitivities and specificities much closer to satisfying
the equal odds criterion. In the context of na\"ive Bayes and logistic
regression, we analyze the equity-directed bootstrap, demonstrating that it
works by bringing odds ratios close to one, and linking it to methods involving
intercept adjustment, thresholding, and weighting.
Related papers
- Twice Class Bias Correction for Imbalanced Semi-Supervised Learning [59.90429949214134]
We introduce a novel approach called textbfTwice textbfClass textbfBias textbfCorrection (textbfTCBC)
We estimate the class bias of the model parameters during the training process.
We apply a secondary correction to the model's pseudo-labels for unlabeled samples.
arXiv Detail & Related papers (2023-12-27T15:06:36Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Deep Imbalanced Regression via Hierarchical Classification Adjustment [50.19438850112964]
Regression tasks in computer vision are often formulated into classification by quantizing the target space into classes.
The majority of training samples lie in a head range of target values, while a minority of samples span a usually larger tail range.
We propose to construct hierarchical classifiers for solving imbalanced regression tasks.
Our novel hierarchical classification adjustment (HCA) for imbalanced regression shows superior results on three diverse tasks.
arXiv Detail & Related papers (2023-10-26T04:54:39Z) - Learning to Adapt Classifier for Imbalanced Semi-supervised Learning [38.434729550279116]
Pseudo-labeling has proven to be a promising semi-supervised learning (SSL) paradigm.
Existing pseudo-labeling methods commonly assume that the class distributions of training data are balanced.
In this work, we investigate pseudo-labeling under imbalanced semi-supervised setups.
arXiv Detail & Related papers (2022-07-28T02:15:47Z) - Relieving Long-tailed Instance Segmentation via Pairwise Class Balance [85.53585498649252]
Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes.
It causes severe biases of the head classes (with majority samples) against the tailed ones.
We propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences.
arXiv Detail & Related papers (2022-01-08T07:48:36Z) - Prototypical Classifier for Robust Class-Imbalanced Learning [64.96088324684683]
We propose textitPrototypical, which does not require fitting additional parameters given the embedding network.
Prototypical produces balanced and comparable predictions for all classes even though the training set is class-imbalanced.
We test our method on CIFAR-10LT, CIFAR-100LT and Webvision datasets, observing that Prototypical obtains substaintial improvements compared with state of the arts.
arXiv Detail & Related papers (2021-10-22T01:55:01Z) - Statistical Theory for Imbalanced Binary Classification [8.93993657323783]
We show that optimal classification performance depends on certain properties of class imbalance that have not previously been formalized.
Specifically, we propose a novel sub-type of class imbalance, which we call Uniform Class Imbalance.
These results provide some of the first meaningful finite-sample statistical theory for imbalanced binary classification.
arXiv Detail & Related papers (2021-07-05T03:55:43Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Binary Classification: Counterbalancing Class Imbalance by Applying
Regression Models in Combination with One-Sided Label Shifts [0.4970364068620607]
We introduce a novel method, which addresses the issues of class imbalance.
We generate a set of negative and positive target labels, such that the corresponding regression task becomes balanced.
We evaluate our approach on a number of publicly available data sets and compare our proposed method to one of the most popular oversampling techniques.
arXiv Detail & Related papers (2020-11-30T13:24:47Z) - Statistical and Algorithmic Insights for Semi-supervised Learning with
Self-training [30.866440916522826]
Self-training is a classical approach in semi-supervised learning.
We show that self-training iterations gracefully improve the model accuracy even if they do get stuck in sub-optimal fixed points.
We then establish a connection between self-training based semi-supervision and the more general problem of learning with heterogenous data.
arXiv Detail & Related papers (2020-06-19T08:09:07Z) - VaB-AL: Incorporating Class Imbalance and Difficulty with Variational
Bayes for Active Learning [38.33920705605981]
We propose a method that can naturally incorporate class imbalance into the Active Learning framework.
We show that our method can be applied to tasks classification on multiple different datasets.
arXiv Detail & Related papers (2020-03-25T07:34:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.