Related papers: A Causal Lens for Peeking into Black Box Predictive Models: Predictive Model Interpretation via Causal Attribution

A Causal Lens for Peeking into Black Box Predictive Models: Predictive Model Interpretation via Causal Attribution

URL: http://arxiv.org/abs/2008.00357v1
Date: Sat, 1 Aug 2020 23:20:57 GMT
Title: A Causal Lens for Peeking into Black Box Predictive Models: Predictive Model Interpretation via Causal Attribution
Authors: Aria Khademi, Vasant Honavar
Abstract summary: We aim to address this problem in settings where the predictive model is a black box. We reduce the problem of interpreting a black box predictive model to that of estimating the causal effects of each of the model inputs on the model output. We show how the resulting causal attribution of responsibility for model output to the different model inputs can be used to interpret the predictive model and to explain its predictions.
Score: 3.3758186776249928
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the increasing adoption of predictive models trained using machine learning across a wide range of high-stakes applications, e.g., health care, security, criminal justice, finance, and education, there is a growing need for effective techniques for explaining such models and their predictions. We aim to address this problem in settings where the predictive model is a black box; That is, we can only observe the response of the model to various inputs, but have no knowledge about the internal structure of the predictive model, its parameters, the objective function, and the algorithm used to optimize the model. We reduce the problem of interpreting a black box predictive model to that of estimating the causal effects of each of the model inputs on the model output, from observations of the model inputs and the corresponding outputs. We estimate the causal effects of model inputs on model output using variants of the Rubin Neyman potential outcomes framework for estimating causal effects from observational data. We show how the resulting causal attribution of responsibility for model output to the different model inputs can be used to interpret the predictive model and to explain its predictions. We present results of experiments that demonstrate the effectiveness of our approach to the interpretation of black box predictive models via causal attribution in the case of deep neural network models trained on one synthetic data set (where the input variables that impact the output variable are known by design) and two real-world data sets: Handwritten digit classification, and Parkinson's disease severity prediction. Because our approach does not require knowledge about the predictive model algorithm and is free of assumptions regarding the black box predictive model except that its input-output responses be observable, it can be applied, in principle, to any black box predictive model.

Related papers

From Black-box to Causal-box: Towards Building More Interpretable Models [57.23201263629627]
We introduce the notion of causal interpretability, which formalizes when counterfactual queries can be evaluated from a specific class of models.<n>We derive a complete graphical criterion that determines whether a given model architecture supports a given counterfactual query.
arXiv Detail & Related papers (2025-10-24T20:03:18Z)
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z)
Influence Functions for Scalable Data Attribution in Diffusion Models [52.92223039302037]
Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges by developing an textitinfluence functions framework.
arXiv Detail & Related papers (2024-10-17T17:59:02Z)
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z)
Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models. We present theoretical results on the expected churn between models within the Rashomon set. We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z)
A performance characteristic curve for model evaluation: the application in information diffusion prediction [3.8711489380602804]
We propose a metric based on information entropy to quantify the randomness in diffusion data, then identify a scaling pattern between the randomness and the prediction accuracy of the model. Data points in the patterns by different sequence lengths, system sizes, and randomness all collapse into a single curve, capturing a model's inherent capability of making correct predictions. The validity of the curve is tested by three prediction models in the same family, reaching conclusions in line with existing studies.
arXiv Detail & Related papers (2023-09-18T07:32:57Z)
A prediction and behavioural analysis of machine learning methods for modelling travel mode choice [0.26249027950824505]
We conduct a systematic comparison of different modelling approaches, across multiple modelling problems, in terms of the key factors likely to affect model choice. Results indicate that the models with the highest disaggregate predictive performance provide poorer estimates of behavioural indicators and aggregate mode shares. It is also observed that the MNL model performs robustly in a variety of situations, though ML techniques can improve the estimates of behavioural indices such as Willingness to Pay.
arXiv Detail & Related papers (2023-01-11T11:10:32Z)
Stability of clinical prediction models developed using statistical or machine learning methods [0.5482532589225552]
Clinical prediction models estimate an individual's risk of a particular health outcome, conditional on their values of multiple predictors. Many models are developed using small datasets that lead to instability in the model and its predictions (estimated risks) We show instability in a model's estimated risks is often considerable, and manifests itself as miscalibration of predictions in new data.
arXiv Detail & Related papers (2022-11-02T11:55:28Z)
Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models. We provide a language for describing how training data influences predictions, through a causal framework. Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z)
Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels. Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features. These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z)
Black-box Adversarial Attacks on Network-wide Multi-step Traffic State Prediction Models [4.353029347463806]
We propose an adversarial attack framework by treating the prediction model as a black-box. The adversary can oracle the prediction model with any input and obtain corresponding output. To test the attack effectiveness, two state of the art, graph neural network-based models (GCGRNN and DCRNN) are examined.
arXiv Detail & Related papers (2021-10-17T03:45:35Z)
Hessian-based toolbox for reliable and interpretable machine learning in physics [58.720142291102135]
We present a toolbox for interpretability and reliability, extrapolation of the model architecture. It provides a notion of the influence of the input data on the prediction at a given test point, an estimation of the uncertainty of the model predictions, and an agnostic score for the model predictions. Our work opens the road to the systematic use of interpretability and reliability methods in ML applied to physics and, more generally, science.
arXiv Detail & Related papers (2021-08-04T16:32:59Z)
Design of Dynamic Experiments for Black-Box Model Discrimination [72.2414939419588]
Consider a dynamic model discrimination setting where we wish to chose: (i) what is the best mechanistic, time-varying model and (ii) what are the best model parameter estimates. For rival mechanistic models where we have access to gradient information, we extend existing methods to incorporate a wider range of problem uncertainty. We replace these black-box models with Gaussian process surrogate models and thereby extend the model discrimination setting to additionally incorporate rival black-box model.
arXiv Detail & Related papers (2021-02-07T11:34:39Z)
A comprehensive study on the prediction reliability of graph neural networks for virtual screening [0.0]
We investigate the effects of model architectures, regularization methods, and loss functions on the prediction performance and reliability of classification results. Our result highlights that correct choice of regularization and inference methods is evidently important to achieve high success rate.
arXiv Detail & Related papers (2020-03-17T10:13:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.