Biased Models Have Biased Explanations
- URL: http://arxiv.org/abs/2012.10986v1
- Date: Sun, 20 Dec 2020 18:09:45 GMT
- Title: Biased Models Have Biased Explanations
- Authors: Aditya Jain, Manish Ravula, Joydeep Ghosh
- Abstract summary: We study fairness in Machine Learning (FairML) through the lens of attribute-based explanations generated for machine learning models.
We first translate existing statistical notions of group fairness and define these notions in terms of explanations given by the model.
Then, we propose a novel way of detecting (un)fairness for any black box model.
- Score: 10.9397029555303
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study fairness in Machine Learning (FairML) through the lens of
attribute-based explanations generated for machine learning models. Our
hypothesis is: Biased Models have Biased Explanations. To establish that, we
first translate existing statistical notions of group fairness and define these
notions in terms of explanations given by the model. Then, we propose a novel
way of detecting (un)fairness for any black box model. We further look at
post-processing techniques for fairness and reason how explanations can be used
to make a bias mitigation technique more individually fair. We also introduce a
novel post-processing mitigation technique which increases individual fairness
in recourse while maintaining group level fairness.
Related papers
- Bias Begets Bias: The Impact of Biased Embeddings on Diffusion Models [0.0]
Text-to-Image (TTI) systems have come under increased scrutiny for social biases.
We investigate embedding spaces as a source of bias for TTI models.
We find that biased multimodal embeddings like CLIP can result in lower alignment scores for representationally balanced TTI models.
arXiv Detail & Related papers (2024-09-15T01:09:55Z) - "Patriarchy Hurts Men Too." Does Your Model Agree? A Discussion on Fairness Assumptions [3.706222947143855]
In the context of group fairness, this approach often obscures implicit assumptions about how bias is introduced into the data.
We are assuming that the biasing process is a monotonic function of the fair scores, dependent solely on the sensitive attribute.
Either the behavior of the biasing process is more complex than mere monotonicity, which means we need to identify and reject our implicit assumptions.
arXiv Detail & Related papers (2024-08-01T07:06:30Z) - Learning for Counterfactual Fairness from Observational Data [62.43249746968616]
Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age.
A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data.
In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE.
arXiv Detail & Related papers (2023-07-17T04:08:29Z) - DualFair: Fair Representation Learning at Both Group and Individual
Levels via Contrastive Self-supervision [73.80009454050858]
This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations.
Our model jointly optimize for two fairness criteria - group fairness and counterfactual fairness.
arXiv Detail & Related papers (2023-03-15T07:13:54Z) - Fair Enough: Standardizing Evaluation and Model Selection for Fairness
Research in NLP [64.45845091719002]
Modern NLP systems exhibit a range of biases, which a growing literature on model debiasing attempts to correct.
This paper seeks to clarify the current situation and plot a course for meaningful progress in fair learning.
arXiv Detail & Related papers (2023-02-11T14:54:00Z) - Revealing Unfair Models by Mining Interpretable Evidence [50.48264727620845]
The popularity of machine learning has increased the risk of unfair models getting deployed in high-stake applications.
In this paper, we tackle the novel task of revealing unfair models by mining interpretable evidence.
Our method finds highly interpretable and solid evidence to effectively reveal the unfairness of trained models.
arXiv Detail & Related papers (2022-07-12T20:03:08Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z) - Wasserstein-based fairness interpretability framework for machine
learning models [0.2519906683279153]
We introduce a fairness interpretability framework for measuring and explaining the bias in classification and regression models.
We measure the model bias across sub-population distributions in the model output using the Wasserstein metric.
We take into account the favorability of both the model and predictors with respect to the non-protected class.
arXiv Detail & Related papers (2020-11-06T02:01:29Z) - Explainability for fair machine learning [10.227479910430866]
We present a new approach to explaining fairness in machine learning, based on the Shapley value paradigm.
Our fairness explanations attribute a model's overall unfairness to individual input features, even in cases where the model does not operate on sensitive attributes directly.
We propose a meta algorithm for applying existing training-time fairness interventions, wherein one trains a perturbation to the original model, rather than a new model entirely.
arXiv Detail & Related papers (2020-10-14T20:21:01Z) - FairALM: Augmented Lagrangian Method for Training Fair Models with
Little Regret [42.66567001275493]
It is now accepted that because of biases in the datasets we present to the models, a fairness-oblivious training will lead to unfair models.
Here, we study mechanisms that impose fairness concurrently while training the model.
arXiv Detail & Related papers (2020-04-03T03:18:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.