Shedding light on underrepresentation and Sampling Bias in machine
learning
- URL: http://arxiv.org/abs/2306.05068v1
- Date: Thu, 8 Jun 2023 09:34:20 GMT
- Title: Shedding light on underrepresentation and Sampling Bias in machine
learning
- Authors: Sami Zhioua, R\=uta Binkyt\.e
- Abstract summary: We show how discrimination can be decomposed into variance, bias, and noise.
We challenge the commonly accepted mitigation approach that discrimination can be addressed by collecting more samples of the underrepresented group.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Accurately measuring discrimination is crucial to faithfully assessing
fairness of trained machine learning (ML) models. Any bias in measuring
discrimination leads to either amplification or underestimation of the existing
disparity. Several sources of bias exist and it is assumed that bias resulting
from machine learning is born equally by different groups (e.g. females vs
males, whites vs blacks, etc.). If, however, bias is born differently by
different groups, it may exacerbate discrimination against specific
sub-populations. Sampling bias, is inconsistently used in the literature to
describe bias due to the sampling procedure. In this paper, we attempt to
disambiguate this term by introducing clearly defined variants of sampling
bias, namely, sample size bias (SSB) and underrepresentation bias (URB). We
show also how discrimination can be decomposed into variance, bias, and noise.
Finally, we challenge the commonly accepted mitigation approach that
discrimination can be addressed by collecting more samples of the
underrepresented group.
Related papers
- AIM: Attributing, Interpreting, Mitigating Data Unfairness [40.351282126410545]
Existing fair machine learning (FairML) research has predominantly focused on mitigating discriminative bias in the model prediction.
We investigate a novel research problem: discovering samples that reflect biases/prejudices from the training data.
We propose practical algorithms for measuring and countering sample bias.
arXiv Detail & Related papers (2024-06-13T05:21:10Z) - Dissecting Causal Biases [0.0]
This paper focuses on a class of bias originating in the way training data is generated and/or collected.
Four sources of bias are considered, namely, confounding, selection, measurement, and interaction.
arXiv Detail & Related papers (2023-10-20T09:12:10Z) - BLIND: Bias Removal With No Demographics [29.16221451643288]
We introduce BLIND, a method for bias removal with no prior knowledge of the demographics in the dataset.
While training a model on a downstream task, BLIND detects biased samples using an auxiliary model that predicts the main model's success, and down-weights those samples during the training process.
Experiments with racial and gender biases in sentiment classification and occupation classification tasks demonstrate that BLIND mitigates social biases without relying on a costly demographic annotation process.
arXiv Detail & Related papers (2022-12-20T18:59:42Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - Gradient Based Activations for Accurate Bias-Free Learning [22.264226961225003]
We show that a biased discriminator can actually be used to improve this bias-accuracy tradeoff.
Specifically, this is achieved by using a feature masking approach using the discriminator's gradients.
We show that this simple approach works well to reduce bias as well as improve accuracy significantly.
arXiv Detail & Related papers (2022-02-17T00:30:40Z) - Fair Group-Shared Representations with Normalizing Flows [68.29997072804537]
We develop a fair representation learning algorithm which is able to map individuals belonging to different groups in a single group.
We show experimentally that our methodology is competitive with other fair representation learning algorithms.
arXiv Detail & Related papers (2022-01-17T10:49:49Z) - Fairness-aware Class Imbalanced Learning [57.45784950421179]
We evaluate long-tail learning methods for tweet sentiment and occupation classification.
We extend a margin-loss based approach with methods to enforce fairness.
arXiv Detail & Related papers (2021-09-21T22:16:30Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - LOGAN: Local Group Bias Detection by Clustering [86.38331353310114]
We argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model.
We propose LOGAN, a new bias detection technique based on clustering.
Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region.
arXiv Detail & Related papers (2020-10-06T16:42:51Z) - Mitigating Gender Bias Amplification in Distribution by Posterior
Regularization [75.3529537096899]
We investigate the gender bias amplification issue from the distribution perspective.
We propose a bias mitigation approach based on posterior regularization.
Our study sheds the light on understanding the bias amplification.
arXiv Detail & Related papers (2020-05-13T11:07:10Z) - A survey of bias in Machine Learning through the prism of Statistical
Parity for the Adult Data Set [5.277804553312449]
We show the importance of understanding how a bias can be introduced into automatic decisions.
We first present a mathematical framework for the fair learning problem, specifically in the binary classification setting.
We then propose to quantify the presence of bias by using the standard Disparate Impact index on the real and well-known Adult income data set.
arXiv Detail & Related papers (2020-03-31T14:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.