Causal Analysis for Robust Interpretability of Neural Networks
- URL: http://arxiv.org/abs/2305.08950v2
- Date: Tue, 20 Jun 2023 15:43:32 GMT
- Title: Causal Analysis for Robust Interpretability of Neural Networks
- Authors: Ola Ahmad, Nicolas Bereux, Lo\"ic Baret, Vahid Hashemi, Freddy Lecue
- Abstract summary: We develop a robust interventional-based method to capture cause-effect mechanisms in pre-trained neural networks.
We apply our method to vision models trained on classification tasks.
- Score: 0.2519906683279152
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interpreting the inner function of neural networks is crucial for the
trustworthy development and deployment of these black-box models. Prior
interpretability methods focus on correlation-based measures to attribute model
decisions to individual examples. However, these measures are susceptible to
noise and spurious correlations encoded in the model during the training phase
(e.g., biased inputs, model overfitting, or misspecification). Moreover, this
process has proven to result in noisy and unstable attributions that prevent
any transparent understanding of the model's behavior. In this paper, we
develop a robust interventional-based method grounded by causal analysis to
capture cause-effect mechanisms in pre-trained neural networks and their
relation to the prediction. Our novel approach relies on path interventions to
infer the causal mechanisms within hidden layers and isolate relevant and
necessary information (to model prediction), avoiding noisy ones. The result is
task-specific causal explanatory graphs that can audit model behavior and
express the actual causes underlying its performance. We apply our method to
vision models trained on classification tasks. On image classification tasks,
we provide extensive quantitative experiments to show that our approach can
capture more stable and faithful explanations than standard attribution-based
methods. Furthermore, the underlying causal graphs reveal the neural
interactions in the model, making it a valuable tool in other applications
(e.g., model repair).
Related papers
- Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models [20.29451537633895]
We propose the use of causal interventions to reverse engineer neural rankers.
We demonstrate how mechanistic interpretability methods can be used to isolate components satisfying term-frequency axioms.
arXiv Detail & Related papers (2024-05-03T22:30:15Z) - Interpretable Imitation Learning with Dynamic Causal Relations [65.18456572421702]
We propose to expose captured knowledge in the form of a directed acyclic causal graph.
We also design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs.
The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner.
arXiv Detail & Related papers (2023-09-30T20:59:42Z) - Study of Distractors in Neural Models of Code [4.043200001974071]
Finding important features that contribute to the prediction of neural models is an active area of research in explainable AI.
In this work, we present an inverse perspective of distractor features: features that cast doubt about the prediction by affecting the model's confidence in its prediction.
Our experiments across various tasks, models, and datasets of code reveal that the removal of tokens can have a significant impact on the confidence of models in their predictions.
arXiv Detail & Related papers (2023-03-03T06:54:01Z) - Influence Tuning: Demoting Spurious Correlations via Instance
Attribution and Instance-Driven Updates [26.527311287924995]
influence tuning can help deconfounding the model from spurious patterns in data.
We show that in a controlled setup, influence tuning can help deconfounding the model from spurious patterns in data.
arXiv Detail & Related papers (2021-10-07T06:59:46Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Building Reliable Explanations of Unreliable Neural Networks: Locally
Smoothing Perspective of Model Interpretation [0.0]
We present a novel method for reliably explaining the predictions of neural networks.
Our method is built on top of the assumption of smooth landscape in a loss function of the model prediction.
arXiv Detail & Related papers (2021-03-26T08:52:11Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Firearm Detection via Convolutional Neural Networks: Comparing a
Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents.
One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis.
We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z) - Structural Causal Models Are (Solvable by) Credal Networks [70.45873402967297]
Causal inferences can be obtained by standard algorithms for the updating of credal nets.
This contribution should be regarded as a systematic approach to represent structural causal models by credal networks.
Experiments show that approximate algorithms for credal networks can immediately be used to do causal inference in real-size problems.
arXiv Detail & Related papers (2020-08-02T11:19:36Z) - A comprehensive study on the prediction reliability of graph neural
networks for virtual screening [0.0]
We investigate the effects of model architectures, regularization methods, and loss functions on the prediction performance and reliability of classification results.
Our result highlights that correct choice of regularization and inference methods is evidently important to achieve high success rate.
arXiv Detail & Related papers (2020-03-17T10:13:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.