Explaining the Behavior of Black-Box Prediction Algorithms with Causal
Learning
- URL: http://arxiv.org/abs/2006.02482v4
- Date: Wed, 6 Sep 2023 01:04:40 GMT
- Title: Explaining the Behavior of Black-Box Prediction Algorithms with Causal
Learning
- Authors: Numair Sani, Daniel Malinsky, Ilya Shpitser
- Abstract summary: Causal approaches to post-hoc explainability for black-box prediction models have become increasingly popular.
We learn causal graphical representations that allow for arbitrary unmeasured confounding among features.
Our approach is motivated by a counterfactual theory of causal explanation wherein good explanations point to factors that are "difference-makers" in an interventionist sense.
- Score: 9.279259759707996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Causal approaches to post-hoc explainability for black-box prediction models
(e.g., deep neural networks trained on image pixel data) have become
increasingly popular. However, existing approaches have two important
shortcomings: (i) the "explanatory units" are micro-level inputs into the
relevant prediction model, e.g., image pixels, rather than interpretable
macro-level features that are more useful for understanding how to possibly
change the algorithm's behavior, and (ii) existing approaches assume there
exists no unmeasured confounding between features and target model predictions,
which fails to hold when the explanatory units are macro-level variables. Our
focus is on the important setting where the analyst has no access to the inner
workings of the target prediction algorithm, rather only the ability to query
the output of the model in response to a particular input. To provide causal
explanations in such a setting, we propose to learn causal graphical
representations that allow for arbitrary unmeasured confounding among features.
We demonstrate the resulting graph can differentiate between interpretable
features that causally influence model predictions versus those that are merely
associated with model predictions due to confounding. Our approach is motivated
by a counterfactual theory of causal explanation wherein good explanations
point to factors that are "difference-makers" in an interventionist sense.
Related papers
- Counterfactual explainability of black-box prediction models [4.14360329494344]
We propose a new notion called counterfactual explainability for black-box prediction models.
Counterfactual explainability has three key advantages.
arXiv Detail & Related papers (2024-11-03T16:29:09Z) - Explaining Hate Speech Classification with Model Agnostic Methods [0.9990687944474738]
The research goal of this paper is to bridge the gap between hate speech prediction and the explanations generated by the system to support its decision.
This has been achieved by first predicting the classification of a text and then providing a posthoc, model agnostic and surrogate interpretability approach.
arXiv Detail & Related papers (2023-05-30T19:52:56Z) - Causal Analysis for Robust Interpretability of Neural Networks [0.2519906683279152]
We develop a robust interventional-based method to capture cause-effect mechanisms in pre-trained neural networks.
We apply our method to vision models trained on classification tasks.
arXiv Detail & Related papers (2023-05-15T18:37:24Z) - Rationalizing Predictions by Adversarial Information Calibration [65.19407304154177]
We train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction.
We use an adversarial technique to calibrate the information extracted by the two models such that the difference between them is an indicator of the missed or over-selected features.
arXiv Detail & Related papers (2023-01-15T03:13:09Z) - Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces [14.70409833767752]
Explainable AI aims to overcome the black-box nature of complex ML models like neural networks by generating explanations for their predictions.
We propose two new analyses, extending principles found in PCA or ICA to explanations.
These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), maximize relevance instead of e.g. variance or kurtosis.
arXiv Detail & Related papers (2022-12-30T18:04:25Z) - CLEAR: Generative Counterfactual Explanations on Graphs [60.30009215290265]
We study the problem of counterfactual explanation generation on graphs.
A few studies have explored counterfactual explanations on graphs, but many challenges of this problem are still not well-addressed.
We propose a novel framework CLEAR which aims to generate counterfactual explanations on graphs for graph-level prediction models.
arXiv Detail & Related papers (2022-10-16T04:35:32Z) - Counterfactual Explanations for Predictive Business Process Monitoring [0.90238471756546]
We propose LORELEY, a counterfactual explanation technique for predictive process monitoring.
LORELEY can approximate prediction models with an average fidelity of 97.69% and generate realistic counterfactual explanations.
arXiv Detail & Related papers (2022-02-24T11:01:20Z) - Deconfounding to Explanation Evaluation in Graph Neural Networks [136.73451468551656]
We argue that a distribution shift exists between the full graph and the subgraph, causing the out-of-distribution problem.
We propose Deconfounded Subgraph Evaluation (DSE) which assesses the causal effect of an explanatory subgraph on the model prediction.
arXiv Detail & Related papers (2022-01-21T18:05:00Z) - Understanding Neural Abstractive Summarization Models via Uncertainty [54.37665950633147]
seq2seq abstractive summarization models generate text in a free-form manner.
We study the entropy, or uncertainty, of the model's token-level predictions.
We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.
arXiv Detail & Related papers (2020-10-15T16:57:27Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z) - Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented.
It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts.
Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.