Robust Explainability: A Tutorial on Gradient-Based Attribution Methods
for Deep Neural Networks
- URL: http://arxiv.org/abs/2107.11400v1
- Date: Fri, 23 Jul 2021 18:06:29 GMT
- Title: Robust Explainability: A Tutorial on Gradient-Based Attribution Methods
for Deep Neural Networks
- Authors: Ian E. Nielsen, Ghulam Rasool, Dimah Dera, Nidhal Bouaynaya, Ravi P.
Ramachandran
- Abstract summary: We present gradient-based interpretability methods for explaining decisions of deep neural networks.
We discuss the role that adversarial robustness plays in having meaningful explanations.
We conclude with the future directions for research in the area at the convergence of robustness and explainability.
- Score: 1.5854438418597576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rise of deep neural networks, the challenge of explaining the
predictions of these networks has become increasingly recognized. While many
methods for explaining the decisions of deep neural networks exist, there is
currently no consensus on how to evaluate them. On the other hand, robustness
is a popular topic for deep learning research; however, it is hardly talked
about in explainability until very recently. In this tutorial paper, we start
by presenting gradient-based interpretability methods. These techniques use
gradient signals to assign the burden of the decision on the input features.
Later, we discuss how gradient-based methods can be evaluated for their
robustness and the role that adversarial robustness plays in having meaningful
explanations. We also discuss the limitations of gradient-based methods.
Finally, we present the best practices and attributes that should be examined
before choosing an explainability method. We conclude with the future
directions for research in the area at the convergence of robustness and
explainability.
Related papers
- On the Value of Labeled Data and Symbolic Methods for Hidden Neuron Activation Analysis [1.55858752644861]
State of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans.
We introduce a novel model-agnostic post-hoc Explainable AI method demonstrating that it provides meaningful interpretations.
arXiv Detail & Related papers (2024-04-21T07:57:45Z) - Toward Understanding the Disagreement Problem in Neural Network Feature Attribution [0.8057006406834466]
neural networks have demonstrated their remarkable ability to discern intricate patterns and relationships from raw data.
Understanding the inner workings of these black box models remains challenging, yet crucial for high-stake decisions.
Our work addresses this confusion by investigating the explanations' fundamental and distributional behavior.
arXiv Detail & Related papers (2024-04-17T12:45:59Z) - DARE: Towards Robust Text Explanations in Biomedical and Healthcare
Applications [54.93807822347193]
We show how to adapt attribution robustness estimation methods to a given domain, so as to take into account domain-specific plausibility.
Next, we provide two methods, adversarial training and FAR training, to mitigate the brittleness characterized by DARE.
Finally, we empirically validate our methods with extensive experiments on three established biomedical benchmarks.
arXiv Detail & Related papers (2023-07-05T08:11:40Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Adversarial Attacks on the Interpretation of Neuron Activation
Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models.
In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z) - NeuroExplainer: Fine-Grained Attention Decoding to Uncover Cortical
Development Patterns of Preterm Infants [73.85768093666582]
We propose an explainable geometric deep network dubbed NeuroExplainer.
NeuroExplainer is used to uncover altered infant cortical development patterns associated with preterm birth.
arXiv Detail & Related papers (2023-01-01T12:48:12Z) - Robust Explanation Constraints for Neural Networks [33.14373978947437]
Post-hoc explanation methods used with the intent of neural networks are sometimes said to help engender trust in their outputs.
Our training method is the only method able to learn neural networks with insights about robustness tested across all six tested networks.
arXiv Detail & Related papers (2022-12-16T14:40:25Z) - Transparency of Deep Neural Networks for Medical Image Analysis: A
Review of Interpretability Methods [3.3918638314432936]
Deep neural networks have shown same or better performance than clinicians in many tasks.
Current deep neural solutions are referred to as black-boxes due to a lack of understanding of the specifics concerning the decision making process.
There is a need to ensure interpretability of deep neural networks before they can be incorporated in the routine clinical workflow.
arXiv Detail & Related papers (2021-11-01T01:42:26Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - Distilling neural networks into skipgram-level decision lists [4.109840601429086]
We propose a pipeline to explain RNNs by means of decision lists (also called rules) over skipgrams.
We find that our technique persistently achieves high explanation fidelity and qualitatively interpretable rules.
arXiv Detail & Related papers (2020-05-14T16:25:42Z) - Binary Neural Networks: A Survey [126.67799882857656]
The binary neural network serves as a promising technique for deploying deep models on resource-limited devices.
The binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network.
We present a survey of these algorithms, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error.
arXiv Detail & Related papers (2020-03-31T16:47:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.