Related papers: Causal Analysis for Robust Interpretability of Neural Networks

Causal Analysis for Robust Interpretability of Neural Networks

URL: http://arxiv.org/abs/2305.08950v2
Date: Tue, 20 Jun 2023 15:43:32 GMT
Title: Causal Analysis for Robust Interpretability of Neural Networks
Authors: Ola Ahmad, Nicolas Bereux, Lo\"ic Baret, Vahid Hashemi, Freddy Lecue
Abstract summary: We develop a robust interventional-based method to capture cause-effect mechanisms in pre-trained neural networks. We apply our method to vision models trained on classification tasks.
Score: 0.2519906683279152
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Interpreting the inner function of neural networks is crucial for the trustworthy development and deployment of these black-box models. Prior interpretability methods focus on correlation-based measures to attribute model decisions to individual examples. However, these measures are susceptible to noise and spurious correlations encoded in the model during the training phase (e.g., biased inputs, model overfitting, or misspecification). Moreover, this process has proven to result in noisy and unstable attributions that prevent any transparent understanding of the model's behavior. In this paper, we develop a robust interventional-based method grounded by causal analysis to capture cause-effect mechanisms in pre-trained neural networks and their relation to the prediction. Our novel approach relies on path interventions to infer the causal mechanisms within hidden layers and isolate relevant and necessary information (to model prediction), avoiding noisy ones. The result is task-specific causal explanatory graphs that can audit model behavior and express the actual causes underlying its performance. We apply our method to vision models trained on classification tasks. On image classification tasks, we provide extensive quantitative experiments to show that our approach can capture more stable and faithful explanations than standard attribution-based methods. Furthermore, the underlying causal graphs reveal the neural interactions in the model, making it a valuable tool in other applications (e.g., model repair).

Related papers

Towards Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients [9.961090778082285]
Deep learning models achieve high predictive performance but lack intrinsic interpretability. We introduce a novel framework for local interventional explanations by leveraging recent advances in image-to-image editing models. Our approach performs gradual interventions on semantic properties to quantify the corresponding impact on a model's predictions.
arXiv Detail & Related papers (2025-03-07T13:50:37Z)
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z)
Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models [20.29451537633895]
We propose the use of causal interventions to reverse engineer neural rankers. We demonstrate how mechanistic interpretability methods can be used to isolate components satisfying term-frequency axioms.
arXiv Detail & Related papers (2024-05-03T22:30:15Z)
Interpretable Imitation Learning with Dynamic Causal Relations [65.18456572421702]
We propose to expose captured knowledge in the form of a directed acyclic causal graph. We also design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs. The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner.
arXiv Detail & Related papers (2023-09-30T20:59:42Z)
Study of Distractors in Neural Models of Code [4.043200001974071]
Finding important features that contribute to the prediction of neural models is an active area of research in explainable AI. In this work, we present an inverse perspective of distractor features: features that cast doubt about the prediction by affecting the model's confidence in its prediction. Our experiments across various tasks, models, and datasets of code reveal that the removal of tokens can have a significant impact on the confidence of models in their predictions.
arXiv Detail & Related papers (2023-03-03T06:54:01Z)
Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates [26.527311287924995]
influence tuning can help deconfounding the model from spurious patterns in data. We show that in a controlled setup, influence tuning can help deconfounding the model from spurious patterns in data.
arXiv Detail & Related papers (2021-10-07T06:59:46Z)
Estimation of Bivariate Structural Causal Models by Variational Gaussian Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models. One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z)
Building Reliable Explanations of Unreliable Neural Networks: Locally Smoothing Perspective of Model Interpretation [0.0]
We present a novel method for reliably explaining the predictions of neural networks. Our method is built on top of the assumption of smooth landscape in a loss function of the model prediction.
arXiv Detail & Related papers (2021-03-26T08:52:11Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents. One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis. We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z)
Structural Causal Models Are (Solvable by) Credal Networks [70.45873402967297]
Causal inferences can be obtained by standard algorithms for the updating of credal nets. This contribution should be regarded as a systematic approach to represent structural causal models by credal networks. Experiments show that approximate algorithms for credal networks can immediately be used to do causal inference in real-size problems.
arXiv Detail & Related papers (2020-08-02T11:19:36Z)
A comprehensive study on the prediction reliability of graph neural networks for virtual screening [0.0]
We investigate the effects of model architectures, regularization methods, and loss functions on the prediction performance and reliability of classification results. Our result highlights that correct choice of regularization and inference methods is evidently important to achieve high success rate.
arXiv Detail & Related papers (2020-03-17T10:13:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.