Related papers: Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

URL: http://arxiv.org/abs/2209.15605v8
Date: Thu, 27 Apr 2023 17:29:44 GMT
Title: Bias Mimicking: A Simple Sampling Approach for Bias Mitigation
Authors: Maan Qraitem, Kate Saenko, Bryan A. Plummer
Abstract summary: We introduce a new class-conditioned sampling method: Bias Mimicking. Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3% over four benchmarks.
Score: 57.17709477668213
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Prior work has shown that Visual Recognition datasets frequently underrepresent bias groups $B$ (\eg Female) within class labels $Y$ (\eg Programmers). This dataset bias can lead to models that learn spurious correlations between class labels and bias groups such as age, gender, or race. Most recent methods that address this problem require significant architectural changes or additional loss functions requiring more hyper-parameter tuning. Alternatively, data sampling baselines from the class imbalance literature (\eg Undersampling, Upweighting), which can often be implemented in a single line of code and often have no hyperparameters, offer a cheaper and more efficient solution. However, these methods suffer from significant shortcomings. For example, Undersampling drops a significant part of the input distribution per epoch while Oversampling repeats samples, causing overfitting. To address these shortcomings, we introduce a new class-conditioned sampling method: Bias Mimicking. The method is based on the observation that if a class $c$ bias distribution, \ie $P_D(B|Y=c)$ is mimicked across every $c^{\prime}\neq c$, then $Y$ and $B$ are statistically independent. Using this notion, BM, through a novel training procedure, ensures that the model is exposed to the entire distribution per epoch without repeating samples. Consequently, Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3\% over four benchmarks while maintaining and sometimes improving performance over nonsampling methods. Code: \url{https://github.com/mqraitem/Bias-Mimicking}

Related papers

Sebra: Debiasing Through Self-Guided Bias Ranking [54.09529903433859]
Ranking samples by fine-grained estimates of spuriosity has recently been shown to significantly benefit bias mitigation. We propose a debiasing framework based on our novel ulSelf-Guided ulBias ulRanking (emphSebra) Sebra mitigates biases via an automatic ranking of data points by spuriosity within their respective classes.
arXiv Detail & Related papers (2025-01-30T11:31:38Z)
Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond [28.651041302245538]
We present a new data selection approach based on $k$-means clustering and sampling sensitivity. We show how it can be applied on linear regression, leading to a new sampling strategy that surprisingly matches the performances of leverage score sampling.
arXiv Detail & Related papers (2024-02-27T09:03:43Z)
Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint. We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b. We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z)
IBADR: an Iterative Bias-Aware Dataset Refinement Framework for Debiasing NLU models [52.03761198830643]
We propose IBADR, an Iterative Bias-Aware dataset Refinement framework. We first train a shallow model to quantify the bias degree of samples in the pool. Then, we pair each sample with a bias indicator representing its bias degree, and use these extended samples to train a sample generator. In this way, this generator can effectively learn the correspondence relationship between bias indicators and samples.
arXiv Detail & Related papers (2023-11-01T04:50:38Z)
Efficient Hybrid Oversampling and Intelligent Undersampling for Imbalanced Big Data Classification [1.03590082373586]
We present a novel resampling method called SMOTENN that combines intelligent undersampling and oversampling using a MapReduce framework. Our experimental results show the virtues of this approach, outperforming alternative resampling techniques for small- and medium-sized datasets.
arXiv Detail & Related papers (2023-10-09T15:22:13Z)
Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases [62.54519787811138]
We present a simple but effective method to measure and mitigate model biases caused by reliance on spurious cues. We rank images within their classes based on spuriosity, proxied via deep neural features of an interpretable network. Our results suggest that model bias due to spurious feature reliance is influenced far more by what the model is trained on than how it is trained.
arXiv Detail & Related papers (2022-12-05T23:15:43Z)
BiasEnsemble: Revisiting the Importance of Amplifying Bias for Debiasing [31.665352191081357]
"Debiasing" aims to train a classifier to be less susceptible to dataset bias. $f_B$ is trained to focus on bias-aligned samples while $f_D$ is mainly trained with bias-conflicting samples. We propose a novel biased sample selection method BiasEnsemble which removes the bias-conflicting samples.
arXiv Detail & Related papers (2022-05-29T07:55:06Z)
Stop Oversampling for Class Imbalance Learning: A Critical Review [0.9208007322096533]
Oversampling has been employed to overcome the challenge of learning from imbalanced datasets. The fundamental difficulty with oversampling approaches is that, given a real-life population, the synthesized samples may not truly belong to the minority class. We devised a new oversampling evaluation system based on hiding a number of majority examples and comparing them to those generated by the oversampling process.
arXiv Detail & Related papers (2022-02-04T15:11:11Z)
Relieving Long-tailed Instance Segmentation via Pairwise Class Balance [85.53585498649252]
Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes. It causes severe biases of the head classes (with majority samples) against the tailed ones. We propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences.
arXiv Detail & Related papers (2022-01-08T07:48:36Z)
Does Adversarial Oversampling Help us? [10.210871872870737]
We propose a three-player adversarial game-based end-to-end method to handle class imbalance in datasets. Rather than adversarial minority oversampling, we propose an adversarial oversampling (AO) and a data-space oversampling (DO) approach. The effectiveness of our proposed method has been validated with high-dimensional, highly imbalanced and large-scale multi-class datasets.
arXiv Detail & Related papers (2021-08-20T05:43:17Z)
Coping with Label Shift via Distributionally Robust Optimisation [72.80971421083937]
We propose a model that minimises an objective based on distributionally robust optimisation (DRO) We then design and analyse a gradient descent-proximal mirror ascent algorithm tailored for large-scale problems to optimise the proposed objective.
arXiv Detail & Related papers (2020-10-23T08:33:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.