Building Reliable Explanations of Unreliable Neural Networks: Locally
Smoothing Perspective of Model Interpretation
- URL: http://arxiv.org/abs/2103.14332v2
- Date: Mon, 29 Mar 2021 01:59:56 GMT
- Title: Building Reliable Explanations of Unreliable Neural Networks: Locally
Smoothing Perspective of Model Interpretation
- Authors: Dohun Lim, Hyeonseok Lee and Sungchan Kim
- Abstract summary: We present a novel method for reliably explaining the predictions of neural networks.
Our method is built on top of the assumption of smooth landscape in a loss function of the model prediction.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel method for reliably explaining the predictions of neural
networks. We consider an explanation reliable if it identifies input features
relevant to the model output by considering the input and the neighboring data
points. Our method is built on top of the assumption of smooth landscape in a
loss function of the model prediction: locally consistent loss and gradient
profile. A theoretical analysis established in this study suggests that those
locally smooth model explanations are learned using a batch of noisy copies of
the input with the L1 regularization for a saliency map. Extensive experiments
support the analysis results, revealing that the proposed saliency maps
retrieve the original classes of adversarial examples crafted against both
naturally and adversarially trained models, significantly outperforming
previous methods. We further demonstrated that such good performance results
from the learning capability of this method to identify input features that are
truly relevant to the model output of the input and the neighboring data
points, fulfilling the requirements of a reliable explanation.
Related papers
- Learning Latent Graph Structures and their Uncertainty [63.95971478893842]
Graph Neural Networks (GNNs) use relational information as an inductive bias to enhance the model's accuracy.
As task-relevant relations might be unknown, graph structure learning approaches have been proposed to learn them while solving the downstream prediction task.
arXiv Detail & Related papers (2024-05-30T10:49:22Z) - Generalizing Backpropagation for Gradient-Based Interpretability [103.2998254573497]
We show that the gradient of a model is a special case of a more general formulation using semirings.
This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics.
arXiv Detail & Related papers (2023-07-06T15:19:53Z) - Causal Analysis for Robust Interpretability of Neural Networks [0.2519906683279152]
We develop a robust interventional-based method to capture cause-effect mechanisms in pre-trained neural networks.
We apply our method to vision models trained on classification tasks.
arXiv Detail & Related papers (2023-05-15T18:37:24Z) - Rethinking interpretation: Input-agnostic saliency mapping of deep
visual classifiers [28.28834523468462]
Saliency methods provide post-hoc model interpretation by attributing input features to the model outputs.
We show that input-specific saliency mapping is intrinsically susceptible to misleading feature attribution.
We introduce a new perspective of input-agnostic saliency mapping that computationally estimates the high-level features attributed by the model to its outputs.
arXiv Detail & Related papers (2023-03-31T06:58:45Z) - Deconfounding to Explanation Evaluation in Graph Neural Networks [136.73451468551656]
We argue that a distribution shift exists between the full graph and the subgraph, causing the out-of-distribution problem.
We propose Deconfounded Subgraph Evaluation (DSE) which assesses the causal effect of an explanatory subgraph on the model prediction.
arXiv Detail & Related papers (2022-01-21T18:05:00Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Bayesian Imaging With Data-Driven Priors Encoded by Neural Networks:
Theory, Methods, and Algorithms [2.266704469122763]
This paper proposes a new methodology for performing Bayesian inference in imaging inverse problems where the prior knowledge is available in the form of training data.
We establish the existence and well-posedness of the associated posterior moments under easily verifiable conditions.
A model accuracy analysis suggests that the Bayesian probability probabilities reported by the data-driven models are also remarkably accurate under a frequentist definition.
arXiv Detail & Related papers (2021-03-18T11:34:08Z) - Explaining and Improving Model Behavior with k Nearest Neighbor
Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions.
We show that kNN representations are effective at uncovering learned spurious associations.
Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z) - A generalizable saliency map-based interpretation of model outcome [1.14219428942199]
We propose a non-intrusive interpretability technique that uses the input and output of the model to generate a saliency map.
Experiments show that our interpretability method can reconstruct the salient part of the input with a classification accuracy of 89%.
arXiv Detail & Related papers (2020-06-16T20:34:42Z) - A comprehensive study on the prediction reliability of graph neural
networks for virtual screening [0.0]
We investigate the effects of model architectures, regularization methods, and loss functions on the prediction performance and reliability of classification results.
Our result highlights that correct choice of regularization and inference methods is evidently important to achieve high success rate.
arXiv Detail & Related papers (2020-03-17T10:13:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.