Does Data Repair Lead to Fair Models? Curating Contextually Fair Data To
Reduce Model Bias
- URL: http://arxiv.org/abs/2110.10389v1
- Date: Wed, 20 Oct 2021 06:00:03 GMT
- Title: Does Data Repair Lead to Fair Models? Curating Contextually Fair Data To
Reduce Model Bias
- Authors: Sharat Agarwal, Sumanyu Muku, Saket Anand, Chetan Arora
- Abstract summary: Contextual information is a valuable cue for Deep Neural Networks (DNNs) to learn better representations and improve accuracy.
In COCO, many object categories have a much higher co-occurrence with men compared to women, which can bias a DNN's prediction in favor of men.
We introduce a data repair algorithm using the coefficient of variation, which can curate fair and contextually balanced data for a protected class.
- Score: 10.639605996067534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contextual information is a valuable cue for Deep Neural Networks (DNNs) to
learn better representations and improve accuracy. However, co-occurrence bias
in the training dataset may hamper a DNN model's generalizability to unseen
scenarios in the real world. For example, in COCO, many object categories have
a much higher co-occurrence with men compared to women, which can bias a DNN's
prediction in favor of men. Recent works have focused on task-specific training
strategies to handle bias in such scenarios, but fixing the available data is
often ignored. In this paper, we propose a novel and more generic solution to
address the contextual bias in the datasets by selecting a subset of the
samples, which is fair in terms of the co-occurrence with various classes for a
protected attribute. We introduce a data repair algorithm using the coefficient
of variation, which can curate fair and contextually balanced data for a
protected class(es). This helps in training a fair model irrespective of the
task, architecture or training methodology. Our proposed solution is simple,
effective, and can even be used in an active learning setting where the data
labels are not present or being generated incrementally. We demonstrate the
effectiveness of our algorithm for the task of object detection and multi-label
image classification across different datasets. Through a series of
experiments, we validate that curating contextually fair data helps make model
predictions fair by balancing the true positive rate for the protected class
across groups without compromising on the model's overall performance.
Related papers
- Model Debiasing by Learnable Data Augmentation [19.625915578646758]
This paper proposes a novel 2-stage learning pipeline featuring a data augmentation strategy able to regularize the training.
Experiments on synthetic and realistic biased datasets show state-of-the-art classification accuracy, outperforming competing methods.
arXiv Detail & Related papers (2024-08-09T09:19:59Z) - Fairness Without Harm: An Influence-Guided Active Sampling Approach [32.173195437797766]
We aim to train models that mitigate group fairness disparity without causing harm to model accuracy.
The current data acquisition methods, such as fair active learning approaches, typically require annotating sensitive attributes.
We propose a tractable active data sampling algorithm that does not rely on training group annotations.
arXiv Detail & Related papers (2024-02-20T07:57:38Z) - Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and
Beyond [93.96982273042296]
Vision-language (VL) understanding tasks evaluate models' comprehension of complex visual scenes through multiple-choice questions.
We have identified two dataset biases that models can exploit as shortcuts to resolve various VL tasks correctly without proper understanding.
We propose Adversarial Data Synthesis (ADS) to generate synthetic training and debiased evaluation data.
We then introduce Intra-sample Counterfactual Training (ICT) to assist models in utilizing the synthesized training data, particularly the counterfactual data, via focusing on intra-sample differentiation.
arXiv Detail & Related papers (2023-10-23T08:09:42Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z) - Deep Learning on a Healthy Data Diet: Finding Important Examples for
Fairness [15.210232622716129]
Data-driven predictive solutions predominant in commercial applications tend to suffer from biases and stereotypes.
Data augmentation reduces gender bias by adding counterfactual examples to the training dataset.
We show that some of the examples in the augmented dataset can be not important or even harmful for fairness.
arXiv Detail & Related papers (2022-11-20T22:42:30Z) - Unbiased Supervised Contrastive Learning [10.728852691100338]
In this work, we tackle the problem of learning representations that are robust to biases.
We first present a margin-based theoretical framework that allows us to clarify why recent contrastive losses can fail when dealing with biased data.
We derive a novel formulation of the supervised contrastive loss (epsilon-SupInfoNCE), providing more accurate control of the minimal distance between positive and negative samples.
Thanks to our theoretical framework, we also propose FairKL, a new debiasing regularization loss, that works well even with extremely biased data.
arXiv Detail & Related papers (2022-11-10T13:44:57Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Distraction is All You Need for Fairness [0.0]
We propose a strategy for training deep learning models called the Distraction module.
This method can be theoretically proven effective in controlling bias from affecting the classification results.
We demonstrate the potency of the proposed method by testing it on UCI Adult and Heritage Health datasets.
arXiv Detail & Related papers (2022-03-15T01:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.