Demographic Parity: Mitigating Biases in Real-World Data
- URL: http://arxiv.org/abs/2309.17347v1
- Date: Wed, 27 Sep 2023 11:47:05 GMT
- Title: Demographic Parity: Mitigating Biases in Real-World Data
- Authors: Orestis Loukas, Ho-Ryun Chung
- Abstract summary: We propose a robust methodology that guarantees the removal of unwanted biases while preserving classification utility.
Our approach can always achieve this in a model-independent way by deriving from real-world data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computer-based decision systems are widely used to automate decisions in many
aspects of everyday life, which include sensitive areas like hiring, loaning
and even criminal sentencing. A decision pipeline heavily relies on large
volumes of historical real-world data for training its models. However,
historical training data often contains gender, racial or other biases which
are propagated to the trained models influencing computer-based decisions. In
this work, we propose a robust methodology that guarantees the removal of
unwanted biases while maximally preserving classification utility. Our approach
can always achieve this in a model-independent way by deriving from real-world
data the asymptotic dataset that uniquely encodes demographic parity and
realism. As a proof-of-principle, we deduce from public census records such an
asymptotic dataset from which synthetic samples can be generated to train
well-established classifiers. Benchmarking the generalization capability of
these classifiers trained on our synthetic data, we confirm the absence of any
explicit or implicit bias in the computer-aided decision.
Related papers
- Online Performance Estimation with Unlabeled Data: A Bayesian Application of the Hui-Walter Paradigm [0.0]
We adapt the Hui-Walter paradigm, a method traditionally applied in epidemiology and medicine, to the field of machine learning.
We estimate key performance metrics such as false positive rate, false negative rate, and priors in scenarios where no ground truth is available.
We extend this paradigm for handling online data, opening up new possibilities for dynamic data environments.
arXiv Detail & Related papers (2024-01-17T17:46:10Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - Investigating Bias with a Synthetic Data Generator: Empirical Evidence
and Philosophical Interpretation [66.64736150040093]
Machine learning applications are becoming increasingly pervasive in our society.
Risk is that they will systematically spread the bias embedded in data.
We propose to analyze biases by introducing a framework for generating synthetic data with specific types of bias and their combinations.
arXiv Detail & Related papers (2022-09-13T11:18:50Z) - Learning from Heterogeneous Data Based on Social Interactions over
Graphs [58.34060409467834]
This work proposes a decentralized architecture, where individual agents aim at solving a classification problem while observing streaming features of different dimensions.
We show that the.
strategy enables the agents to learn consistently under this highly-heterogeneous setting.
We show that the.
strategy enables the agents to learn consistently under this highly-heterogeneous setting.
arXiv Detail & Related papers (2021-12-17T12:47:18Z) - Evaluating Predictive Uncertainty and Robustness to Distributional Shift
Using Real World Data [0.0]
We propose metrics for general regression tasks using the Shifts Weather Prediction dataset.
We also present an evaluation of the baseline methods using these metrics.
arXiv Detail & Related papers (2021-11-08T17:32:10Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z) - Removing biased data to improve fairness and accuracy [1.3535770763481905]
We propose a black-box approach to identify and remove biased training data.
Machine learning models trained on such debiased data have low individual discrimination, often 0%.
Our approach outperformed seven previous approaches in terms of individual discrimination and accuracy.
arXiv Detail & Related papers (2021-02-05T08:34:45Z) - DomainMix: Learning Generalizable Person Re-Identification Without Human
Annotations [89.78473564527688]
This paper shows how to use labeled synthetic dataset and unlabeled real-world dataset to train a universal model.
In this way, human annotations are no longer required, and it is scalable to large and diverse real-world datasets.
Experimental results show that the proposed annotation-free method is more or less comparable to the counterpart trained with full human annotations.
arXiv Detail & Related papers (2020-11-24T08:15:53Z) - Fairness-Aware Online Personalization [16.320648868892526]
We present a study of fairness in online personalization settings involving the ranking of individuals.
We first demonstrate that online personalization can cause the model to learn to act in an unfair manner if the user is biased in his/her responses.
We then formulate the problem of learning personalized models under fairness constraints and present a regularization based approach for mitigating biases in machine learning.
arXiv Detail & Related papers (2020-07-30T07:16:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.