Counterfactual explainability of black-box prediction models
- URL: http://arxiv.org/abs/2411.01625v1
- Date: Sun, 03 Nov 2024 16:29:09 GMT
- Title: Counterfactual explainability of black-box prediction models
- Authors: Zijun Gao, Qingyuan Zhao,
- Abstract summary: We propose a new notion called counterfactual explainability for black-box prediction models.
Counterfactual explainability has three key advantages.
- Score: 4.14360329494344
- License:
- Abstract: It is crucial to be able to explain black-box prediction models to use them effectively and safely in practice. Most existing tools for model explanations are associational rather than causal, and we use two paradoxical examples to show that such explanations are generally inadequate. Motivated by the concept of genetic heritability in twin studies, we propose a new notion called counterfactual explainability for black-box prediction models. Counterfactual explainability has three key advantages: (1) it leverages counterfactual outcomes and extends methods for global sensitivity analysis (such as functional analysis of variance and Sobol's indices) to a causal setting; (2) it is defined not only for the totality of a set of input factors but also for their interactions (indeed, it is a probability measure on a whole ``explanation algebra''); (3) it also applies to dependent input factors whose causal relationship can be modeled by a directed acyclic graph, thus incorporating causal mechanisms into the explanation.
Related papers
- Hard to Explain: On the Computational Hardness of In-Distribution Model Interpretation [0.9558392439655016]
The ability to interpret Machine Learning (ML) models is becoming increasingly essential.
Recent work has demonstrated that it is possible to formally assess interpretability by studying the computational complexity of explaining the decisions of various models.
arXiv Detail & Related papers (2024-08-07T17:20:52Z) - Identifiable Latent Neural Causal Models [82.14087963690561]
Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data.
We determine the types of distribution shifts that do contribute to the identifiability of causal representations.
We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations.
arXiv Detail & Related papers (2024-03-23T04:13:55Z) - Evaluating the Robustness of Interpretability Methods through
Explanation Invariance and Equivariance [72.50214227616728]
Interpretability methods are valuable only if their explanations faithfully describe the explained model.
We consider neural networks whose predictions are invariant under a specific symmetry group.
arXiv Detail & Related papers (2023-04-13T17:59:03Z) - Explanatory causal effects for model agnostic explanations [27.129579550130423]
We study the problem of estimating the contributions of features to the prediction of a specific instance by a machine learning model.
A challenge is that most existing causal effects cannot be estimated from data without a known causal graph.
We define an explanatory causal effect based on a hypothetical ideal experiment.
arXiv Detail & Related papers (2022-06-23T08:25:31Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Explaining Causal Models with Argumentation: the Case of Bi-variate
Reinforcement [15.947501347927687]
We introduce a conceptualisation for generating argumentation frameworks (AFs) from causal models.
The conceptualisation is based on reinterpreting desirable properties of semantics of AFs as explanation moulds.
We perform a theoretical evaluation of these argumentative explanations, examining whether they satisfy a range of desirable explanatory and argumentative properties.
arXiv Detail & Related papers (2022-05-23T19:39:51Z) - Model Explanations via the Axiomatic Causal Lens [9.915489218010952]
We propose three explanation measures which aggregate the set of all but-for causes into feature importance weights.
Our first measure is a natural adaptation of Chockler and Halpern's notion of causal responsibility.
We extend our approach to derive a new method to compute the Shapley-Shubik and Banzhaf indices for black-box model explanations.
arXiv Detail & Related papers (2021-09-08T19:33:52Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z) - Explaining the Behavior of Black-Box Prediction Algorithms with Causal
Learning [9.279259759707996]
Causal approaches to post-hoc explainability for black-box prediction models have become increasingly popular.
We learn causal graphical representations that allow for arbitrary unmeasured confounding among features.
Our approach is motivated by a counterfactual theory of causal explanation wherein good explanations point to factors that are "difference-makers" in an interventionist sense.
arXiv Detail & Related papers (2020-06-03T19:02:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.