Related papers: Identifying the Source of Vulnerability in Explanation Discrepancy: A Case Study in Neural Text Classification

Identifying the Source of Vulnerability in Explanation Discrepancy: A Case Study in Neural Text Classification

URL: http://arxiv.org/abs/2212.05327v1
Date: Sat, 10 Dec 2022 16:04:34 GMT
Title: Identifying the Source of Vulnerability in Explanation Discrepancy: A Case Study in Neural Text Classification
Authors: Ruixuan Tang, Hanjie Chen, Yangfeng Ji
Abstract summary: Some recent works observed the instability of post-hoc explanations when input side perturbations are applied to the model. This raises the interest and concern in the stability of post-hoc explanations. This work explores the potential source that leads to unstable post-hoc explanations.
Score: 18.27912226867123
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Some recent works observed the instability of post-hoc explanations when input side perturbations are applied to the model. This raises the interest and concern in the stability of post-hoc explanations. However, the remaining question is: is the instability caused by the neural network model or the post-hoc explanation method? This work explores the potential source that leads to unstable post-hoc explanations. To separate the influence from the model, we propose a simple output probability perturbation method. Compared to prior input side perturbation methods, the output probability perturbation method can circumvent the neural model's potential effect on the explanations and allow the analysis on the explanation method. We evaluate the proposed method with three widely-used post-hoc explanation methods (LIME (Ribeiro et al., 2016), Kernel Shapley (Lundberg and Lee, 2017a), and Sample Shapley (Strumbelj and Kononenko, 2010)). The results demonstrate that the post-hoc methods are stable, barely producing discrepant explanations under output probability perturbations. The observation suggests that neural network models may be the primary source of fragile explanations.

Related papers

Are We Merely Justifying Results ex Post Facto? Quantifying Explanatory Inversion in Post-Hoc Model Explanations [87.68633031231924]
Post-hoc explanation methods provide interpretation by attributing predictions to input features. Do these explanations unintentionally reverse the natural relationship between inputs and outputs? We propose Inversion Quantification (IQ), a framework that quantifies the degree to which explanations rely on outputs and deviate from faithful input-output relationships.
arXiv Detail & Related papers (2025-04-11T19:00:12Z)
Selective Explanations [14.312717332216073]
A machine learning model is trained to predict feature attribution scores with only one inference. Despite their efficiency, amortized explainers can produce inaccurate predictions and misleading explanations. We propose selective explanations, a novel feature attribution method that detects when amortized explainers generate low-quality explanations.
arXiv Detail & Related papers (2024-05-29T23:08:31Z)
Uncertainty Quantification for Gradient-based Explanations in Neural Networks [6.9060054915724]
We propose a pipeline to ascertain the explanation uncertainty of neural networks. We use this pipeline to produce explanation distributions for the CIFAR-10, FER+, and California Housing datasets. We compute modified pixel insertion/deletion metrics to evaluate the quality of the generated explanations.
arXiv Detail & Related papers (2024-03-25T21:56:02Z)
Data-centric Prediction Explanation via Kernelized Stein Discrepancy [14.177012256360635]
This paper presents a Highly-precise and Data-centric Explanation (HD-Explain) prediction explanation method that exploits properties of Kernelized Stein Discrepancy (KSD) Specifically, the KSD uniquely defines a parameterized kernel function for a trained model that encodes model-dependent data correlation. We show that HD-Explain outperforms existing methods from various aspects, including preciseness (fine-grained explanation), consistency, and 3) computation efficiency.
arXiv Detail & Related papers (2024-03-22T19:04:02Z)
Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of Deep Neural Network Explanations [67.40641255908443]
We identify limitations of model-randomization-based sanity checks for the purpose of evaluating explanations. Top-down model randomization preserves scales of forward pass activations with high probability.
arXiv Detail & Related papers (2022-11-22T18:52:38Z)
Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles [50.81061839052459]
We formalize the generation of robust counterfactual explanations as a probabilistic problem. We show the link between the robustness of ensemble models and the robustness of base learners. Our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
arXiv Detail & Related papers (2022-05-27T17:28:54Z)
Explainability in Process Outcome Prediction: Guidelines to Obtain Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction. This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z)
Rethinking Stability for Attribution-based Explanations [20.215505482157255]
We introduce metrics to quantify the stability of an explanation and show that several popular explanation methods are unstable. In particular, we propose new Relative Stability metrics that measure the change in output explanation with respect to change in input, model representation, or output of the underlying predictor.
arXiv Detail & Related papers (2022-03-14T06:19:27Z)
Estimation of Bivariate Structural Causal Models by Variational Gaussian Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models. One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Dependency Decomposition and a Reject Option for Explainable Models [4.94950858749529]
Recent deep learning models perform extremely well in various inference tasks. Recent advances offer methods to visualize features, describe attribution of the input. We present the first analysis of dependencies regarding the probability distribution over the desired image classification outputs.
arXiv Detail & Related papers (2020-12-11T17:39:33Z)
Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? [97.77183117452235]
We carry out human subject tests to isolate the effect of algorithmic explanations on model interpretability. Clear evidence of method effectiveness is found in very few cases. Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability.
arXiv Detail & Related papers (2020-05-04T20:35:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.