Quantifying Feature Contributions to Overall Disparity Using Information
Theory
- URL: http://arxiv.org/abs/2206.08454v1
- Date: Thu, 16 Jun 2022 21:27:22 GMT
- Title: Quantifying Feature Contributions to Overall Disparity Using Information
Theory
- Authors: Sanghamitra Dutta, Praveen Venkatesh, Pulkit Grover
- Abstract summary: When a machine-learning algorithm makes biased decisions, it can be helpful to understand the sources of disparity to explain why the bias exists.
We ask the question: what is the "potential" contribution of each individual feature to the observed disparity in the decisions when the exact decision-making mechanism is not accessible?
When unable to intervene on the inputs, we quantify the "redundant" statistical dependency about the protected attribute that is present in both the final decision and an individual feature.
- Score: 24.61791450920249
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When a machine-learning algorithm makes biased decisions, it can be helpful
to understand the sources of disparity to explain why the bias exists. Towards
this, we examine the problem of quantifying the contribution of each individual
feature to the observed disparity. If we have access to the decision-making
model, one potential approach (inspired from intervention-based approaches in
explainability literature) is to vary each individual feature (while keeping
the others fixed) and use the resulting change in disparity to quantify its
contribution. However, we may not have access to the model or be able to
test/audit its outputs for individually varying features. Furthermore, the
decision may not always be a deterministic function of the input features
(e.g., with human-in-the-loop). For these situations, we might need to explain
contributions using purely distributional (i.e., observational) techniques,
rather than interventional. We ask the question: what is the "potential"
contribution of each individual feature to the observed disparity in the
decisions when the exact decision-making mechanism is not accessible? We first
provide canonical examples (thought experiments) that help illustrate the
difference between distributional and interventional approaches to explaining
contributions, and when either is better suited. When unable to intervene on
the inputs, we quantify the "redundant" statistical dependency about the
protected attribute that is present in both the final decision and an
individual feature, by leveraging a body of work in information theory called
Partial Information Decomposition. We also perform a simple case study to show
how this technique could be applied to quantify contributions.
Related papers
- Detection and Evaluation of bias-inducing Features in Machine learning [14.045499740240823]
In the context of machine learning (ML), one can use cause-to-effect analysis to understand the reason for the biased behavior of the system.
We propose an approach for systematically identifying all bias-inducing features of a model to help support the decision-making of domain experts.
arXiv Detail & Related papers (2023-10-19T15:01:16Z) - The Impact of Explanations on Fairness in Human-AI Decision-Making: Protected vs Proxy Features [25.752072910748716]
Explanations may help human-AI teams address biases for fairer decision-making.
We study the effect of the presence of protected and proxy features on participants' perception of model fairness.
We find that explanations help people detect direct but not indirect biases.
arXiv Detail & Related papers (2023-10-12T16:00:16Z) - Causal Entropy and Information Gain for Measuring Causal Control [0.22252684361733285]
We introduce causal versions of entropy and mutual information, termed causal entropy and causal information gain.
These quantities capture changes in the entropy of a variable resulting from interventions on other variables.
Fundamental results connecting these quantities to the existence of causal effects are derived.
arXiv Detail & Related papers (2023-09-14T13:25:42Z) - Nonparametric Identifiability of Causal Representations from Unknown
Interventions [63.1354734978244]
We study causal representation learning, the task of inferring latent causal variables and their causal relations from mixtures of the variables.
Our goal is to identify both the ground truth latents and their causal graph up to a set of ambiguities which we show to be irresolvable from interventional data.
arXiv Detail & Related papers (2023-06-01T10:51:58Z) - Bounding Counterfactuals under Selection Bias [60.55840896782637]
We propose a first algorithm to address both identifiable and unidentifiable queries.
We prove that, in spite of the missingness induced by the selection bias, the likelihood of the available data is unimodal.
arXiv Detail & Related papers (2022-07-26T10:33:10Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - BayesIMP: Uncertainty Quantification for Causal Data Fusion [52.184885680729224]
We study the causal data fusion problem, where datasets pertaining to multiple causal graphs are combined to estimate the average treatment effect of a target variable.
We introduce a framework which combines ideas from probabilistic integration and kernel mean embeddings to represent interventional distributions in the reproducing kernel Hilbert space.
arXiv Detail & Related papers (2021-06-07T10:14:18Z) - Order in the Court: Explainable AI Methods Prone to Disagreement [0.0]
In Natural Language Processing, feature-additive explanation methods quantify the independent contribution of each input token towards a model's decision.
Previous analyses have sought to either invalidate or support the role of attention-based explanations as a faithful and plausible measure of salience.
We show that rank correlation is largely uninformative and does not measure the quality of feature-additive methods.
arXiv Detail & Related papers (2021-05-07T14:27:37Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - Fundamental Limits and Tradeoffs in Invariant Representation Learning [99.2368462915979]
Many machine learning applications involve learning representations that achieve two competing goals.
Minimax game-theoretic formulation represents a fundamental tradeoff between accuracy and invariance.
We provide an information-theoretic analysis of this general and important problem under both classification and regression settings.
arXiv Detail & Related papers (2020-12-19T15:24:04Z) - Towards Unifying Feature Attribution and Counterfactual Explanations:
Different Means to the Same End [17.226134854746267]
We present a method to generate feature attribution explanations from a set of counterfactual examples.
We show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency.
arXiv Detail & Related papers (2020-11-10T05:41:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.