Bias Mimicking: A Simple Sampling Approach for Bias Mitigation
- URL: http://arxiv.org/abs/2209.15605v8
- Date: Thu, 27 Apr 2023 17:29:44 GMT
- Title: Bias Mimicking: A Simple Sampling Approach for Bias Mitigation
- Authors: Maan Qraitem, Kate Saenko, Bryan A. Plummer
- Abstract summary: We introduce a new class-conditioned sampling method: Bias Mimicking.
Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3% over four benchmarks.
- Score: 57.17709477668213
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prior work has shown that Visual Recognition datasets frequently
underrepresent bias groups $B$ (\eg Female) within class labels $Y$ (\eg
Programmers). This dataset bias can lead to models that learn spurious
correlations between class labels and bias groups such as age, gender, or race.
Most recent methods that address this problem require significant architectural
changes or additional loss functions requiring more hyper-parameter tuning.
Alternatively, data sampling baselines from the class imbalance literature (\eg
Undersampling, Upweighting), which can often be implemented in a single line of
code and often have no hyperparameters, offer a cheaper and more efficient
solution. However, these methods suffer from significant shortcomings. For
example, Undersampling drops a significant part of the input distribution per
epoch while Oversampling repeats samples, causing overfitting. To address these
shortcomings, we introduce a new class-conditioned sampling method: Bias
Mimicking. The method is based on the observation that if a class $c$ bias
distribution, \ie $P_D(B|Y=c)$ is mimicked across every $c^{\prime}\neq c$,
then $Y$ and $B$ are statistically independent. Using this notion, BM, through
a novel training procedure, ensures that the model is exposed to the entire
distribution per epoch without repeating samples. Consequently, Bias Mimicking
improves underrepresented groups' accuracy of sampling methods by 3\% over four
benchmarks while maintaining and sometimes improving performance over
nonsampling methods. Code: \url{https://github.com/mqraitem/Bias-Mimicking}
Related papers
- Data-Efficient Learning via Clustering-Based Sensitivity Sampling:
Foundation Models and Beyond [28.651041302245538]
We present a new data selection approach based on $k$-means clustering and sampling sensitivity.
We show how it can be applied on linear regression, leading to a new sampling strategy that surprisingly matches the performances of leverage score sampling.
arXiv Detail & Related papers (2024-02-27T09:03:43Z) - Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - IBADR: an Iterative Bias-Aware Dataset Refinement Framework for
Debiasing NLU models [52.03761198830643]
We propose IBADR, an Iterative Bias-Aware dataset Refinement framework.
We first train a shallow model to quantify the bias degree of samples in the pool.
Then, we pair each sample with a bias indicator representing its bias degree, and use these extended samples to train a sample generator.
In this way, this generator can effectively learn the correspondence relationship between bias indicators and samples.
arXiv Detail & Related papers (2023-11-01T04:50:38Z) - Efficient Hybrid Oversampling and Intelligent Undersampling for
Imbalanced Big Data Classification [1.03590082373586]
We present a novel resampling method called SMOTENN that combines intelligent undersampling and oversampling using a MapReduce framework.
Our experimental results show the virtues of this approach, outperforming alternative resampling techniques for small- and medium-sized datasets.
arXiv Detail & Related papers (2023-10-09T15:22:13Z) - Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases [62.54519787811138]
We present a simple but effective method to measure and mitigate model biases caused by reliance on spurious cues.
We rank images within their classes based on spuriosity, proxied via deep neural features of an interpretable network.
Our results suggest that model bias due to spurious feature reliance is influenced far more by what the model is trained on than how it is trained.
arXiv Detail & Related papers (2022-12-05T23:15:43Z) - BiasEnsemble: Revisiting the Importance of Amplifying Bias for Debiasing [31.665352191081357]
"Debiasing" aims to train a classifier to be less susceptible to dataset bias.
$f_B$ is trained to focus on bias-aligned samples while $f_D$ is mainly trained with bias-conflicting samples.
We propose a novel biased sample selection method BiasEnsemble which removes the bias-conflicting samples.
arXiv Detail & Related papers (2022-05-29T07:55:06Z) - Stop Oversampling for Class Imbalance Learning: A Critical Review [0.9208007322096533]
Oversampling has been employed to overcome the challenge of learning from imbalanced datasets.
The fundamental difficulty with oversampling approaches is that, given a real-life population, the synthesized samples may not truly belong to the minority class.
We devised a new oversampling evaluation system based on hiding a number of majority examples and comparing them to those generated by the oversampling process.
arXiv Detail & Related papers (2022-02-04T15:11:11Z) - Relieving Long-tailed Instance Segmentation via Pairwise Class Balance [85.53585498649252]
Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes.
It causes severe biases of the head classes (with majority samples) against the tailed ones.
We propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences.
arXiv Detail & Related papers (2022-01-08T07:48:36Z) - Does Adversarial Oversampling Help us? [10.210871872870737]
We propose a three-player adversarial game-based end-to-end method to handle class imbalance in datasets.
Rather than adversarial minority oversampling, we propose an adversarial oversampling (AO) and a data-space oversampling (DO) approach.
The effectiveness of our proposed method has been validated with high-dimensional, highly imbalanced and large-scale multi-class datasets.
arXiv Detail & Related papers (2021-08-20T05:43:17Z) - Coping with Label Shift via Distributionally Robust Optimisation [72.80971421083937]
We propose a model that minimises an objective based on distributionally robust optimisation (DRO)
We then design and analyse a gradient descent-proximal mirror ascent algorithm tailored for large-scale problems to optimise the proposed objective.
arXiv Detail & Related papers (2020-10-23T08:33:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.