Related papers: Fairness and Robustness of Contrasting Explanations

Fairness and Robustness of Contrasting Explanations

URL: http://arxiv.org/abs/2103.02354v1
Date: Wed, 3 Mar 2021 12:16:06 GMT
Title: Fairness and Robustness of Contrasting Explanations
Authors: Andr\'e Artelt and Barbara Hammer
Abstract summary: We study individual fairness and robustness of contrasting explanations. We propose to use plausible counterfactuals instead of closest counterfactuals for improving the individual fairness of counterfactual explanations.
Score: 9.104557591459283
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fairness and explainability are two important and closely related requirements of decision making systems. While ensuring and evaluating fairness as well as explainability of decision masking systems has been extensively studied independently, only little effort has been investigated into studying fairness of explanations on their own - i.e. the explanations it self should be fair. In this work we formally and empirically study individual fairness and robustness of contrasting explanations - in particular we consider counterfactual explanations as a prominent instance of contrasting explanations. Furthermore, we propose to use plausible counterfactuals instead of closest counterfactuals for improving the individual fairness of counterfactual explanations.

Related papers

Ranking Counterfactual Explanations [7.066382982173528]
Explanations can address two key questions: "Why this outcome?" (factual) and "Why not another?" (counterfactual) This paper proposes a formal definition of counterfactual explanations, proving some properties they satisfy, and examining the relationship with factual explanations.
arXiv Detail & Related papers (2025-03-20T03:04:05Z)
Does Faithfulness Conflict with Plausibility? An Empirical Study in Explainable AI across NLP Tasks [9.979726030996051]
We show that Shapley value and LIME could attain greater faithfulness and plausibility. Our findings suggest that rather than optimizing for one dimension at the expense of the other, we could seek to optimize explainability algorithms with dual objectives.
arXiv Detail & Related papers (2024-03-29T20:28:42Z)
On Explaining Unfairness: An Overview [2.0277446818411]
Algorithmic fairness and explainability are foundational elements for achieving responsible AI. We categorize fairness into three types: (a) Explanations to enhance fairness metrics, (b) Explanations to help us understand the causes of (un)fairness, and (c) Explanations to assist us in designing methods for mitigating unfairness.
arXiv Detail & Related papers (2024-02-16T15:38:00Z)
On the Relationship Between Interpretability and Explainability in Machine Learning [2.828173677501078]
Interpretability and explainability have gained more and more attention in the field of machine learning. Since both provide information about predictors and their decision process, they are often seen as two independent means for one single end. This view has led to a dichotomous literature: explainability techniques designed for complex black-box models, or interpretable approaches ignoring the many explainability tools.
arXiv Detail & Related papers (2023-11-20T02:31:08Z)
Dissenting Explanations: Leveraging Disagreement to Reduce Model Overreliance [4.962171160815189]
We introduce the notion of dissenting explanations: conflicting predictions with accompanying explanations. We first explore the advantage of dissenting explanations in the setting of model multiplicity. We demonstrate that dissenting explanations reduce overreliance on model predictions, without reducing overall accuracy.
arXiv Detail & Related papers (2023-07-14T21:27:00Z)
On the Connection between Game-Theoretic Feature Attributions and Counterfactual Explanations [14.552505966070358]
Two of the most popular explanations are feature attributions, and counterfactual explanations. This work establishes a clear theoretical connection between game-theoretic feature attributions and counterfactuals explanations. We shed light on the limitations of naively using counterfactual explanations to provide feature importances.
arXiv Detail & Related papers (2023-07-13T17:57:21Z)
Complementary Explanations for Effective In-Context Learning [77.83124315634386]
Large language models (LLMs) have exhibited remarkable capabilities in learning from explanations in prompts. This work aims to better understand the mechanisms by which explanations are used for in-context learning.
arXiv Detail & Related papers (2022-11-25T04:40:47Z)
Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals. It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation. It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z)
Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data. We find that people often mis-interpret the explanations. We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z)
Prompting Contrastive Explanations for Commonsense Reasoning Tasks [74.7346558082693]
Large pretrained language models (PLMs) can achieve near-human performance on commonsense reasoning tasks. We show how to use these same models to generate human-interpretable evidence.
arXiv Detail & Related papers (2021-06-12T17:06:13Z)
Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models. Our method is based on projecting model representation to a latent space. Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z)
The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models. We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z)
Aligning Faithful Interpretations with their Social Attribution [58.13152510843004]
We find that the requirement of model interpretations to be faithful is vague and incomplete. We identify that the problem is a misalignment between the causal chain of decisions (causal attribution) and the attribution of human behavior to the interpretation (social attribution)
arXiv Detail & Related papers (2020-06-01T16:45:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.