On the Robustness of Explanations of Deep Neural Network Models: A
Survey
- URL: http://arxiv.org/abs/2211.04780v1
- Date: Wed, 9 Nov 2022 10:14:21 GMT
- Title: On the Robustness of Explanations of Deep Neural Network Models: A
Survey
- Authors: Amlan Jyoti, Karthik Balaji Ganesh, Manoj Gayala, Nandita Lakshmi
Tunuguntla, Sandesh Kamath, Vineeth N Balasubramanian
- Abstract summary: We present a comprehensive survey of methods that study, understand, attack, and defend explanations of Deep Neural Network (DNN) models.
We also present a detailed review of different metrics used to evaluate explanation methods, as well as describe attributional attack and defense methods.
- Score: 14.940679892694089
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Explainability has been widely stated as a cornerstone of the responsible and
trustworthy use of machine learning models. With the ubiquitous use of Deep
Neural Network (DNN) models expanding to risk-sensitive and safety-critical
domains, many methods have been proposed to explain the decisions of these
models. Recent years have also seen concerted efforts that have shown how such
explanations can be distorted (attacked) by minor input perturbations. While
there have been many surveys that review explainability methods themselves,
there has been no effort hitherto to assimilate the different methods and
metrics proposed to study the robustness of explanations of DNN models. In this
work, we present a comprehensive survey of methods that study, understand,
attack, and defend explanations of DNN models. We also present a detailed
review of different metrics used to evaluate explanation methods, as well as
describe attributional attack and defense methods. We conclude with lessons and
take-aways for the community towards ensuring robust explanations of DNN model
predictions.
Related papers
- Explainable Graph Neural Networks Under Fire [69.15708723429307]
Graph neural networks (GNNs) usually lack interpretability due to their complex computational behavior and the abstract nature of graphs.
Most GNN explanation methods work in a post-hoc manner and provide explanations in the form of a small subset of important edges and/or nodes.
In this paper we demonstrate that these explanations can unfortunately not be trusted, as common GNN explanation methods turn out to be highly susceptible to adversarial perturbations.
arXiv Detail & Related papers (2024-06-10T16:09:16Z) - Uncertainty Quantification for Gradient-based Explanations in Neural Networks [6.9060054915724]
We propose a pipeline to ascertain the explanation uncertainty of neural networks.
We use this pipeline to produce explanation distributions for the CIFAR-10, FER+, and California Housing datasets.
We compute modified pixel insertion/deletion metrics to evaluate the quality of the generated explanations.
arXiv Detail & Related papers (2024-03-25T21:56:02Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models.
We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z) - Towards Faithful Model Explanation in NLP: A Survey [48.690624266879155]
End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand.
One desideratum of model explanation is faithfulness, i.e. an explanation should accurately represent the reasoning process behind the model's prediction.
We review over 110 model explanation methods in NLP through the lens of faithfulness.
arXiv Detail & Related papers (2022-09-22T21:40:51Z) - Unsupervised Detection of Adversarial Examples with Model Explanations [0.6091702876917279]
We propose a simple yet effective method to detect adversarial examples using methods developed to explain the model's behavior.
Our evaluations with MNIST handwritten dataset show that our method is capable of detecting adversarial examples with high confidence.
arXiv Detail & Related papers (2021-07-22T06:54:18Z) - Feature Attributions and Counterfactual Explanations Can Be Manipulated [32.579094387004346]
We show how adversaries can design biased models that manipulate model agnostic feature attribution methods.
These vulnerabilities allow an adversary to deploy a biased model, yet explanations will not reveal this bias, thereby deceiving stakeholders into trusting the model.
We evaluate the manipulations on real world data sets, including COMPAS and Communities & Crime, and find explanations can be manipulated in practice.
arXiv Detail & Related papers (2021-06-23T17:43:31Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Explainability in Graph Neural Networks: A Taxonomic Survey [42.95574260417341]
Graph neural networks (GNNs) and their explainability are experiencing rapid developments.
There is neither a unified treatment of GNN explainability methods, nor a standard benchmark and testbed for evaluations.
This work provides a unified methodological treatment of GNN explainability and a standardized testbed for evaluations.
arXiv Detail & Related papers (2020-12-31T04:34:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.