Non-Invasive Fairness in Learning through the Lens of Data Drift
- URL: http://arxiv.org/abs/2303.17566v4
- Date: Wed, 9 Aug 2023 15:17:33 GMT
- Title: Non-Invasive Fairness in Learning through the Lens of Data Drift
- Authors: Ke Yang and Alexandra Meliou
- Abstract summary: We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
- Score: 88.37640805363317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine Learning (ML) models are widely employed to drive many modern data
systems. While they are undeniably powerful tools, ML models often demonstrate
imbalanced performance and unfair behaviors. The root of this problem often
lies in the fact that different subpopulations commonly display divergent
trends: as a learning algorithm tries to identify trends in the data, it
naturally favors the trends of the majority groups, leading to a model that
performs poorly and unfairly for minority populations. Our goal is to improve
the fairness and trustworthiness of ML models by applying only non-invasive
interventions, i.e., without altering the data or the learning algorithm. We
use a simple but key insight: the divergence of trends between different
populations, and, consecutively, between a learned model and minority
populations, is analogous to data drift, which indicates the poor conformance
between parts of the data and the trained model. We explore two strategies
(model-splitting and reweighing) to resolve this drift, aiming to improve the
overall conformance of models to the underlying data. Both our methods
introduce novel ways to employ the recently-proposed data profiling primitive
of Conformance Constraints. Our experimental evaluation over 7 real-world
datasets shows that both DifFair and ConFair improve the fairness of ML models.
We demonstrate scenarios where DifFair has an edge, though ConFair has the
greatest practical impact and outperforms other baselines. Moreover, as a
model-agnostic technique, ConFair stays robust when used against different
models than the ones on which the weights have been learned, which is not the
case for other state of the art.
Related papers
- Improving Fairness and Mitigating MADness in Generative Models [21.024727486615646]
We show that training generative models with intentionally designed hypernetworks leads to models that are more fair when generating datapoints belonging to minority classes.
We introduce a regularization term that penalizes discrepancies between a generative model's estimated weights when trained on real data versus its own synthetic data.
arXiv Detail & Related papers (2024-05-22T20:24:41Z) - Federated Skewed Label Learning with Logits Fusion [23.062650578266837]
Federated learning (FL) aims to collaboratively train a shared model across multiple clients without transmitting their local data.
We propose FedBalance, which corrects the optimization bias among local models by calibrating their logits.
Our method can gain 13% higher average accuracy compared with state-of-the-art methods.
arXiv Detail & Related papers (2023-11-14T14:37:33Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Fairness Reprogramming [42.65700878967251]
We propose a new generic fairness learning paradigm, called FairReprogram, which incorporates the model reprogramming technique.
Specifically, FairReprogram considers the case where models can not be changed and appends to the input a set of perturbations, called the fairness trigger.
We show both theoretically and empirically that the fairness trigger can effectively obscure demographic biases in the output prediction of fixed ML models.
arXiv Detail & Related papers (2022-09-21T09:37:00Z) - Bias-inducing geometries: an exactly solvable data model with fairness
implications [13.690313475721094]
We introduce an exactly solvable high-dimensional model of data imbalance.
We analytically unpack the typical properties of learning models trained in this synthetic framework.
We obtain exact predictions for the observables that are commonly employed for fairness assessment.
arXiv Detail & Related papers (2022-05-31T16:27:57Z) - FairIF: Boosting Fairness in Deep Learning via Influence Functions with
Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF.
It minimizes the loss over the reweighted data set where the sample weights are computed.
We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.