What will it take to generate fairness-preserving explanations?
- URL: http://arxiv.org/abs/2106.13346v1
- Date: Thu, 24 Jun 2021 23:03:25 GMT
- Title: What will it take to generate fairness-preserving explanations?
- Authors: Jessica Dai, Sohini Upadhyay, Stephen H. Bach, Himabindu Lakkaraju
- Abstract summary: We focus on explanations applied to datasets, suggesting that explanations do not necessarily preserve the fairness properties of the black-box algorithm.
We propose future research directions for evaluating and generating explanations such that they are informative and relevant from a fairness perspective.
- Score: 15.801388187383973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In situations where explanations of black-box models may be useful, the
fairness of the black-box is also often a relevant concern. However, the link
between the fairness of the black-box model and the behavior of explanations
for the black-box is unclear. We focus on explanations applied to tabular
datasets, suggesting that explanations do not necessarily preserve the fairness
properties of the black-box algorithm. In other words, explanation algorithms
can ignore or obscure critical relevant properties, creating incorrect or
misleading explanations. More broadly, we propose future research directions
for evaluating and generating explanations such that they are informative and
relevant from a fairness perspective.
Related papers
- Does It Make Sense to Explain a Black Box With Another Black Box? [5.377278489623063]
Two main families of counterfactual explanation methods in the literature, namely, (a) emphtransparent methods that perturb the target by adding, removing, or replacing words, and (b) emphopaque approaches that project the target document into a latent, non-interpretable space where the perturbation is carried out subsequently.
Our empirical evidence shows that opaque approaches can be an overkill for downstream applications such as fake news detection or sentiment analysis since they add an additional level of complexity with no significant performance gain.
arXiv Detail & Related papers (2024-04-23T11:40:30Z) - DiConStruct: Causal Concept-based Explanations through Black-Box
Distillation [9.735426765564474]
We present DiConStruct, an explanation method that is both concept-based and causal.
Our explainer works as a distillation model to any black-box machine learning model by approximating its predictions while producing the respective explanations.
arXiv Detail & Related papers (2024-01-16T17:54:02Z) - Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models.
We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z) - Eliminating The Impossible, Whatever Remains Must Be True [46.39428193548396]
We show how one can apply background knowledge to give more succinct "why" formal explanations.
We also show how to use existing rule induction techniques to efficiently extract background information from a dataset.
arXiv Detail & Related papers (2022-06-20T03:18:14Z) - Characterizing the risk of fairwashing [8.545202841051582]
We show that it is possible to build high-fidelity explanation models with low unfairness.
We show that fairwashed explanation models can generalize beyond the suing group.
We conclude that fairwashing attacks can transfer across black-box models.
arXiv Detail & Related papers (2021-06-14T15:33:17Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - Benchmarking and Survey of Explanation Methods for Black Box Models [9.747543620322956]
We provide a categorization of explanation methods based on the type of explanation returned.
We present the most recent and widely used explainers, and we show a visual comparison among explanations and a quantitative benchmarking.
arXiv Detail & Related papers (2021-02-25T18:50:29Z) - Explainers in the Wild: Making Surrogate Explainers Robust to
Distortions through Perception [77.34726150561087]
We propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances.
We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
arXiv Detail & Related papers (2021-02-22T12:38:53Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z) - Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks.
Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z) - Considerations When Learning Additive Explanations for Black-Box Models [16.732047048577638]
We show that different explanation methods characterize non-additive components in a black-box model's prediction function in different ways.
Even though distilled explanations are generally the most accurate additive explanations, non-additive explanations such as tree explanations tend to be even more accurate.
arXiv Detail & Related papers (2018-01-26T00:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.