Related papers: Removing biased data to improve fairness and accuracy

Removing biased data to improve fairness and accuracy

URL: http://arxiv.org/abs/2102.03054v1
Date: Fri, 5 Feb 2021 08:34:45 GMT
Title: Removing biased data to improve fairness and accuracy
Authors: Sahil Verma, Michael Ernst, Rene Just
Abstract summary: We propose a black-box approach to identify and remove biased training data. Machine learning models trained on such debiased data have low individual discrimination, often 0%. Our approach outperformed seven previous approaches in terms of individual discrimination and accuracy.
Score: 1.3535770763481905
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Machine learning systems are often trained using data collected from historical decisions. If past decisions were biased, then automated systems that learn from historical data will also be biased. We propose a black-box approach to identify and remove biased training data. Machine learning models trained on such debiased data (a subset of the original training data) have low individual discrimination, often 0%. These models also have greater accuracy and lower statistical disparity than models trained on the full historical data. We evaluated our methodology in experiments using 6 real-world datasets. Our approach outperformed seven previous approaches in terms of individual discrimination and accuracy.

Related papers

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs [51.00909549291524]
Large language models (LLMs) exhibit cognitive biases.<n>These biases vary across models and can be amplified by instruction tuning.<n>It remains unclear if these differences in biases stem from pretraining, finetuning, or even random noise.
arXiv Detail & Related papers (2025-07-09T18:01:14Z)
Model Debiasing by Learnable Data Augmentation [19.625915578646758]
This paper proposes a novel 2-stage learning pipeline featuring a data augmentation strategy able to regularize the training. Experiments on synthetic and realistic biased datasets show state-of-the-art classification accuracy, outperforming competing methods.
arXiv Detail & Related papers (2024-08-09T09:19:59Z)
Debiasing Machine Unlearning with Counterfactual Examples [31.931056076782202]
We analyze the causal factors behind the unlearning process and mitigate biases at both data and algorithmic levels. We introduce an intervention-based approach, where knowledge to forget is erased with a debiased dataset. Our method outperforms existing machine unlearning baselines on evaluation metrics.
arXiv Detail & Related papers (2024-04-24T09:33:10Z)
Addressing Bias Through Ensemble Learning and Regularized Fine-Tuning [0.2812395851874055]
This paper proposes a comprehensive approach using multiple methods to remove bias in AI models. We train multiple models with the counter-bias of the pre-trained model through data splitting, local training, and regularized fine-tuning. We conclude our solution with knowledge distillation that results in a single unbiased neural network.
arXiv Detail & Related papers (2024-02-01T09:24:36Z)
Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios. Existing debiasing methods suffer from high costs in bias labeling or model re-training. We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z)
Demographic Parity: Mitigating Biases in Real-World Data [0.0]
We propose a robust methodology that guarantees the removal of unwanted biases while preserving classification utility. Our approach can always achieve this in a model-independent way by deriving from real-world data.
arXiv Detail & Related papers (2023-09-27T11:47:05Z)
Pseudo Bias-Balanced Learning for Debiased Chest X-ray Classification [57.53567756716656]
We study the problem of developing debiased chest X-ray diagnosis models without knowing exactly the bias labels. We propose a novel algorithm, pseudo bias-balanced learning, which first captures and predicts per-sample bias labels. Our proposed method achieved consistent improvements over other state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-18T11:02:18Z)
Robust Fairness-aware Learning Under Sample Selection Bias [17.09665420515772]
We propose a framework for robust and fair learning under sample selection bias. We develop two algorithms to handle sample selection bias when test data is both available and unavailable.
arXiv Detail & Related papers (2021-05-24T23:23:36Z)
Learning from others' mistakes: Avoiding dataset biases without modeling them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task. Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available. We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z)
Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair. We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data. A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z)
Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases. First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method. The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)
Leveraging Semi-Supervised Learning for Fairness using Neural Networks [49.604038072384995]
There has been a growing concern about the fairness of decision-making systems based on machine learning. In this paper, we propose a semi-supervised algorithm using neural networks benefiting from unlabeled data. The proposed model, called SSFair, exploits the information in the unlabeled data to mitigate the bias in the training data.
arXiv Detail & Related papers (2019-12-31T09:11:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.