Explainability for fair machine learning
- URL: http://arxiv.org/abs/2010.07389v1
- Date: Wed, 14 Oct 2020 20:21:01 GMT
- Title: Explainability for fair machine learning
- Authors: Tom Begley, Tobias Schwedes, Christopher Frye, Ilya Feige
- Abstract summary: We present a new approach to explaining fairness in machine learning, based on the Shapley value paradigm.
Our fairness explanations attribute a model's overall unfairness to individual input features, even in cases where the model does not operate on sensitive attributes directly.
We propose a meta algorithm for applying existing training-time fairness interventions, wherein one trains a perturbation to the original model, rather than a new model entirely.
- Score: 10.227479910430866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the decisions made or influenced by machine learning models increasingly
impact our lives, it is crucial to detect, understand, and mitigate unfairness.
But even simply determining what "unfairness" should mean in a given context is
non-trivial: there are many competing definitions, and choosing between them
often requires a deep understanding of the underlying task. It is thus tempting
to use model explainability to gain insights into model fairness, however
existing explainability tools do not reliably indicate whether a model is
indeed fair. In this work we present a new approach to explaining fairness in
machine learning, based on the Shapley value paradigm. Our fairness
explanations attribute a model's overall unfairness to individual input
features, even in cases where the model does not operate on sensitive
attributes directly. Moreover, motivated by the linearity of Shapley
explainability, we propose a meta algorithm for applying existing training-time
fairness interventions, wherein one trains a perturbation to the original
model, rather than a new model entirely. By explaining the original model, the
perturbation, and the fair-corrected model, we gain insight into the
accuracy-fairness trade-off that is being made by the intervention. We further
show that this meta algorithm enjoys both flexibility and stability benefits
with no loss in performance.
Related papers
- Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z) - Learning for Counterfactual Fairness from Observational Data [62.43249746968616]
Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age.
A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data.
In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE.
arXiv Detail & Related papers (2023-07-17T04:08:29Z) - Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z) - Achieving Counterfactual Fairness with Imperfect Structural Causal Model [11.108866104714627]
We propose a novel minimax game-theoretic model for counterfactual fairness.
We also theoretically prove the error bound of the proposed minimax model.
Empirical experiments on multiple real-world datasets illustrate our superior performance in both accuracy and fairness.
arXiv Detail & Related papers (2023-03-26T09:37:29Z) - DualFair: Fair Representation Learning at Both Group and Individual
Levels via Contrastive Self-supervision [73.80009454050858]
This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations.
Our model jointly optimize for two fairness criteria - group fairness and counterfactual fairness.
arXiv Detail & Related papers (2023-03-15T07:13:54Z) - Revealing Unfair Models by Mining Interpretable Evidence [50.48264727620845]
The popularity of machine learning has increased the risk of unfair models getting deployed in high-stake applications.
In this paper, we tackle the novel task of revealing unfair models by mining interpretable evidence.
Our method finds highly interpretable and solid evidence to effectively reveal the unfairness of trained models.
arXiv Detail & Related papers (2022-07-12T20:03:08Z) - Leave-one-out Unfairness [17.221751674951562]
We introduce leave-one-out unfairness, which characterizes how likely a model's prediction for an individual will change due to the inclusion or removal of a single other person in the model's training data.
We characterize the extent to which deep models behave leave-one-out unfairly on real data, including in cases where the generalization error is small.
We discuss salient practical applications that may be negatively affected by leave-one-out unfairness.
arXiv Detail & Related papers (2021-07-21T15:55:49Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Biased Models Have Biased Explanations [10.9397029555303]
We study fairness in Machine Learning (FairML) through the lens of attribute-based explanations generated for machine learning models.
We first translate existing statistical notions of group fairness and define these notions in terms of explanations given by the model.
Then, we propose a novel way of detecting (un)fairness for any black box model.
arXiv Detail & Related papers (2020-12-20T18:09:45Z) - FairALM: Augmented Lagrangian Method for Training Fair Models with
Little Regret [42.66567001275493]
It is now accepted that because of biases in the datasets we present to the models, a fairness-oblivious training will lead to unfair models.
Here, we study mechanisms that impose fairness concurrently while training the model.
arXiv Detail & Related papers (2020-04-03T03:18:53Z) - Fairness by Explicability and Adversarial SHAP Learning [0.0]
We propose a new definition of fairness that emphasises the role of an external auditor and model explicability.
We develop a framework for mitigating model bias using regularizations constructed from the SHAP values of an adversarial surrogate model.
We demonstrate our approaches using gradient and adaptive boosting on: a synthetic dataset, the UCI Adult (Census) dataset and a real-world credit scoring dataset.
arXiv Detail & Related papers (2020-03-11T14:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.