Counterfactual Evaluation for Explainable AI
- URL: http://arxiv.org/abs/2109.01962v1
- Date: Sun, 5 Sep 2021 01:38:49 GMT
- Title: Counterfactual Evaluation for Explainable AI
- Authors: Yingqiang Ge, Shuchang Liu, Zelong Li, Shuyuan Xu, Shijie Geng, Yunqi
Li, Juntao Tan, Fei Sun, Yongfeng Zhang
- Abstract summary: We propose a new methodology to evaluate the faithfulness of explanations from the textitcounterfactual reasoning perspective.
We introduce two algorithms to find the proper counterfactuals in both discrete and continuous scenarios and then use the acquired counterfactuals to measure faithfulness.
- Score: 21.055319253405603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While recent years have witnessed the emergence of various explainable
methods in machine learning, to what degree the explanations really represent
the reasoning process behind the model prediction -- namely, the faithfulness
of explanation -- is still an open problem. One commonly used way to measure
faithfulness is \textit{erasure-based} criteria. Though conceptually simple,
erasure-based criterion could inevitably introduce biases and artifacts. We
propose a new methodology to evaluate the faithfulness of explanations from the
\textit{counterfactual reasoning} perspective: the model should produce
substantially different outputs for the original input and its corresponding
counterfactual edited on a faithful feature. Specially, we introduce two
algorithms to find the proper counterfactuals in both discrete and continuous
scenarios and then use the acquired counterfactuals to measure faithfulness.
Empirical results on several datasets show that compared with existing metrics,
our proposed counterfactual evaluation method can achieve top correlation with
the ground truth under diffe
Related papers
- Rethinking Distance Metrics for Counterfactual Explainability [53.436414009687]
We investigate a framing for counterfactual generation methods that considers counterfactuals not as independent draws from a region around the reference, but as jointly sampled with the reference from the underlying data distribution.
We derive a distance metric, tailored for counterfactual similarity that can be applied to a broad range of settings.
arXiv Detail & Related papers (2024-10-18T15:06:50Z) - Longitudinal Counterfactuals: Constraints and Opportunities [59.11233767208572]
We propose using longitudinal data to assess and improve plausibility in counterfactuals.
We develop a metric that compares longitudinal differences to counterfactual differences, allowing us to evaluate how similar a counterfactual is to prior observed changes.
arXiv Detail & Related papers (2024-02-29T20:17:08Z) - Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics.
We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs.
Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z) - Fixing confirmation bias in feature attribution methods via semantic
match [4.733072355085082]
We argue that a structured approach is required to test whether our hypotheses on the model are confirmed by the feature attributions.
This is what we call the "semantic match" between human concepts and (sub-symbolic) explanations.
arXiv Detail & Related papers (2023-07-03T09:50:08Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Counterfactuals of Counterfactuals: a back-translation-inspired approach
to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations.
We propose a new back translation-inspired evaluation methodology.
We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Towards Unifying Feature Attribution and Counterfactual Explanations:
Different Means to the Same End [17.226134854746267]
We present a method to generate feature attribution explanations from a set of counterfactual examples.
We show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency.
arXiv Detail & Related papers (2020-11-10T05:41:43Z) - Getting a CLUE: A Method for Explaining Uncertainty Estimates [30.367995696223726]
We propose a novel method for interpreting uncertainty estimates from differentiable probabilistic models.
Our method, Counterfactual Latent Uncertainty Explanations (CLUE), indicates how to change an input, while keeping it on the data manifold.
arXiv Detail & Related papers (2020-06-11T21:53:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.