Fairness and Robustness of Contrasting Explanations
- URL: http://arxiv.org/abs/2103.02354v1
- Date: Wed, 3 Mar 2021 12:16:06 GMT
- Title: Fairness and Robustness of Contrasting Explanations
- Authors: Andr\'e Artelt and Barbara Hammer
- Abstract summary: We study individual fairness and robustness of contrasting explanations.
We propose to use plausible counterfactuals instead of closest counterfactuals for improving the individual fairness of counterfactual explanations.
- Score: 9.104557591459283
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fairness and explainability are two important and closely related
requirements of decision making systems. While ensuring and evaluating fairness
as well as explainability of decision masking systems has been extensively
studied independently, only little effort has been investigated into studying
fairness of explanations on their own - i.e. the explanations it self should be
fair. In this work we formally and empirically study individual fairness and
robustness of contrasting explanations - in particular we consider
counterfactual explanations as a prominent instance of contrasting
explanations. Furthermore, we propose to use plausible counterfactuals instead
of closest counterfactuals for improving the individual fairness of
counterfactual explanations.
Related papers
- Does Faithfulness Conflict with Plausibility? An Empirical Study in Explainable AI across NLP Tasks [9.979726030996051]
We show that Shapley value and LIME could attain greater faithfulness and plausibility.
Our findings suggest that rather than optimizing for one dimension at the expense of the other, we could seek to optimize explainability algorithms with dual objectives.
arXiv Detail & Related papers (2024-03-29T20:28:42Z) - On Explaining Unfairness: An Overview [2.0277446818411]
Algorithmic fairness and explainability are foundational elements for achieving responsible AI.
We categorize fairness into three types: (a) Explanations to enhance fairness metrics, (b) Explanations to help us understand the causes of (un)fairness, and (c) Explanations to assist us in designing methods for mitigating unfairness.
arXiv Detail & Related papers (2024-02-16T15:38:00Z) - On the Relationship Between Interpretability and Explainability in Machine Learning [2.828173677501078]
Interpretability and explainability have gained more and more attention in the field of machine learning.
Since both provide information about predictors and their decision process, they are often seen as two independent means for one single end.
This view has led to a dichotomous literature: explainability techniques designed for complex black-box models, or interpretable approaches ignoring the many explainability tools.
arXiv Detail & Related papers (2023-11-20T02:31:08Z) - Dissenting Explanations: Leveraging Disagreement to Reduce Model Overreliance [4.962171160815189]
We introduce the notion of dissenting explanations: conflicting predictions with accompanying explanations.
We first explore the advantage of dissenting explanations in the setting of model multiplicity.
We demonstrate that dissenting explanations reduce overreliance on model predictions, without reducing overall accuracy.
arXiv Detail & Related papers (2023-07-14T21:27:00Z) - On the Connection between Game-Theoretic Feature Attributions and
Counterfactual Explanations [14.552505966070358]
Two of the most popular explanations are feature attributions, and counterfactual explanations.
This work establishes a clear theoretical connection between game-theoretic feature attributions and counterfactuals explanations.
We shed light on the limitations of naively using counterfactual explanations to provide feature importances.
arXiv Detail & Related papers (2023-07-13T17:57:21Z) - Complementary Explanations for Effective In-Context Learning [77.83124315634386]
Large language models (LLMs) have exhibited remarkable capabilities in learning from explanations in prompts.
This work aims to better understand the mechanisms by which explanations are used for in-context learning.
arXiv Detail & Related papers (2022-11-25T04:40:47Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - Prompting Contrastive Explanations for Commonsense Reasoning Tasks [74.7346558082693]
Large pretrained language models (PLMs) can achieve near-human performance on commonsense reasoning tasks.
We show how to use these same models to generate human-interpretable evidence.
arXiv Detail & Related papers (2021-06-12T17:06:13Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z) - Aligning Faithful Interpretations with their Social Attribution [58.13152510843004]
We find that the requirement of model interpretations to be faithful is vague and incomplete.
We identify that the problem is a misalignment between the causal chain of decisions (causal attribution) and the attribution of human behavior to the interpretation (social attribution)
arXiv Detail & Related papers (2020-06-01T16:45:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.