Identifying the Source of Vulnerability in Explanation Discrepancy: A
Case Study in Neural Text Classification
- URL: http://arxiv.org/abs/2212.05327v1
- Date: Sat, 10 Dec 2022 16:04:34 GMT
- Title: Identifying the Source of Vulnerability in Explanation Discrepancy: A
Case Study in Neural Text Classification
- Authors: Ruixuan Tang, Hanjie Chen, Yangfeng Ji
- Abstract summary: Some recent works observed the instability of post-hoc explanations when input side perturbations are applied to the model.
This raises the interest and concern in the stability of post-hoc explanations.
This work explores the potential source that leads to unstable post-hoc explanations.
- Score: 18.27912226867123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Some recent works observed the instability of post-hoc explanations when
input side perturbations are applied to the model. This raises the interest and
concern in the stability of post-hoc explanations. However, the remaining
question is: is the instability caused by the neural network model or the
post-hoc explanation method? This work explores the potential source that leads
to unstable post-hoc explanations. To separate the influence from the model, we
propose a simple output probability perturbation method. Compared to prior
input side perturbation methods, the output probability perturbation method can
circumvent the neural model's potential effect on the explanations and allow
the analysis on the explanation method. We evaluate the proposed method with
three widely-used post-hoc explanation methods (LIME (Ribeiro et al., 2016),
Kernel Shapley (Lundberg and Lee, 2017a), and Sample Shapley (Strumbelj and
Kononenko, 2010)). The results demonstrate that the post-hoc methods are
stable, barely producing discrepant explanations under output probability
perturbations. The observation suggests that neural network models may be the
primary source of fragile explanations.
Related papers
- Selective Explanations [14.312717332216073]
A machine learning model is trained to predict feature attribution scores with only one inference.
Despite their efficiency, amortized explainers can produce inaccurate predictions and misleading explanations.
We propose selective explanations, a novel feature attribution method that detects when amortized explainers generate low-quality explanations.
arXiv Detail & Related papers (2024-05-29T23:08:31Z) - Uncertainty Quantification for Gradient-based Explanations in Neural Networks [6.9060054915724]
We propose a pipeline to ascertain the explanation uncertainty of neural networks.
We use this pipeline to produce explanation distributions for the CIFAR-10, FER+, and California Housing datasets.
We compute modified pixel insertion/deletion metrics to evaluate the quality of the generated explanations.
arXiv Detail & Related papers (2024-03-25T21:56:02Z) - Data-centric Prediction Explanation via Kernelized Stein Discrepancy [14.177012256360635]
This paper presents a Highly-precise and Data-centric Explanation (HD-Explain) prediction explanation method that exploits properties of Kernelized Stein Discrepancy (KSD)
Specifically, the KSD uniquely defines a parameterized kernel function for a trained model that encodes model-dependent data correlation.
We show that HD-Explain outperforms existing methods from various aspects, including preciseness (fine-grained explanation), consistency, and 3) computation efficiency.
arXiv Detail & Related papers (2024-03-22T19:04:02Z) - Shortcomings of Top-Down Randomization-Based Sanity Checks for
Evaluations of Deep Neural Network Explanations [67.40641255908443]
We identify limitations of model-randomization-based sanity checks for the purpose of evaluating explanations.
Top-down model randomization preserves scales of forward pass activations with high probability.
arXiv Detail & Related papers (2022-11-22T18:52:38Z) - Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles [50.81061839052459]
We formalize the generation of robust counterfactual explanations as a probabilistic problem.
We show the link between the robustness of ensemble models and the robustness of base learners.
Our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
arXiv Detail & Related papers (2022-05-27T17:28:54Z) - Explainability in Process Outcome Prediction: Guidelines to Obtain
Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction.
This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z) - Rethinking Stability for Attribution-based Explanations [20.215505482157255]
We introduce metrics to quantify the stability of an explanation and show that several popular explanation methods are unstable.
In particular, we propose new Relative Stability metrics that measure the change in output explanation with respect to change in input, model representation, or output of the underlying predictor.
arXiv Detail & Related papers (2022-03-14T06:19:27Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Dependency Decomposition and a Reject Option for Explainable Models [4.94950858749529]
Recent deep learning models perform extremely well in various inference tasks.
Recent advances offer methods to visualize features, describe attribution of the input.
We present the first analysis of dependencies regarding the probability distribution over the desired image classification outputs.
arXiv Detail & Related papers (2020-12-11T17:39:33Z) - Evaluating Explainable AI: Which Algorithmic Explanations Help Users
Predict Model Behavior? [97.77183117452235]
We carry out human subject tests to isolate the effect of algorithmic explanations on model interpretability.
Clear evidence of method effectiveness is found in very few cases.
Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability.
arXiv Detail & Related papers (2020-05-04T20:35:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.