Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces
- URL: http://arxiv.org/abs/2212.14855v3
- Date: Mon, 15 Apr 2024 08:24:42 GMT
- Title: Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces
- Authors: Pattarawat Chormai, Jan Herrmann, Klaus-Robert Müller, Grégoire Montavon,
- Abstract summary: Explainable AI aims to overcome the black-box nature of complex ML models like neural networks by generating explanations for their predictions.
We propose two new analyses, extending principles found in PCA or ICA to explanations.
These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), maximize relevance instead of e.g. variance or kurtosis.
- Score: 14.70409833767752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explainable AI aims to overcome the black-box nature of complex ML models like neural networks by generating explanations for their predictions. Explanations often take the form of a heatmap identifying input features (e.g. pixels) that are relevant to the model's decision. These explanations, however, entangle the potentially multiple factors that enter into the overall complex decision strategy. We propose to disentangle explanations by extracting at some intermediate layer of a neural network, subspaces that capture the multiple and distinct activation patterns (e.g. visual concepts) that are relevant to the prediction. To automatically extract these subspaces, we propose two new analyses, extending principles found in PCA or ICA to explanations. These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), maximize relevance instead of e.g. variance or kurtosis. This allows for a much stronger focus of the analysis on what the ML model actually uses for predicting, ignoring activations or concepts to which the model is invariant. Our approach is general enough to work alongside common attribution techniques such as Shapley Value, Integrated Gradients, or LRP. Our proposed methods show to be practically useful and compare favorably to the state of the art as demonstrated on benchmarks and three use cases.
Related papers
- Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - How Well Do Feature-Additive Explainers Explain Feature-Additive
Predictors? [12.993027779814478]
We ask the question: can popular feature-additive explainers (e.g., LIME, SHAP, SHAPR, MAPLE, and PDP) explain feature-additive predictors?
Herein, we evaluate such explainers on ground truth that is analytically derived from the additive structure of a model.
Our results suggest that all explainers eventually fail to correctly attribute the importance of features, especially when a decision-making process involves feature interactions.
arXiv Detail & Related papers (2023-10-27T21:16:28Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Local Interpretable Model Agnostic Shap Explanations for machine
learning models [0.0]
We propose a methodology that we define as Local Interpretable Model Agnostic Shap Explanations (LIMASE)
This proposed technique uses Shapley values under the LIME paradigm to achieve the following (a) explain prediction of any model by using a locally faithful and interpretable decision tree model on which the Tree Explainer is used to calculate the shapley values and give visually interpretable explanations.
arXiv Detail & Related papers (2022-10-10T10:07:27Z) - Explanation Method for Anomaly Detection on Mixed Numerical and
Categorical Spaces [0.9543943371833464]
We present EADMNC (Explainable Anomaly Detection on Mixed Numerical and Categorical spaces)
It adds explainability to the predictions obtained with the original model.
We report experimental results on extensive real-world data, particularly in the domain of network intrusion detection.
arXiv Detail & Related papers (2022-09-09T08:20:13Z) - This looks more like that: Enhancing Self-Explaining Models by
Prototypical Relevance Propagation [17.485732906337507]
We present a case study of the self-explaining network, ProtoPNet, in the presence of a spectrum of artifacts.
We introduce a novel method for generating more precise model-aware explanations.
In order to obtain a clean dataset, we propose to use multi-view clustering strategies for segregating the artifact images.
arXiv Detail & Related papers (2021-08-27T09:55:53Z) - Boundary Attributions Provide Normal (Vector) Explanations [27.20904776964045]
Boundary Attribution (BA) is a new explanation method to address this question.
BA involves computing normal vectors of the local decision boundaries for the target input.
We prove two theorems for ReLU networks: BA of randomized smoothed networks or robustly trained networks is much closer to non-boundary attribution methods than that in standard networks.
arXiv Detail & Related papers (2021-03-20T22:36:39Z) - Toward Scalable and Unified Example-based Explanation and Outlier
Detection [128.23117182137418]
We argue for a broader adoption of prototype-based student networks capable of providing an example-based explanation for their prediction.
We show that our prototype-based networks beyond similarity kernels deliver meaningful explanations and promising outlier detection results without compromising classification accuracy.
arXiv Detail & Related papers (2020-11-11T05:58:17Z) - Understanding Neural Abstractive Summarization Models via Uncertainty [54.37665950633147]
seq2seq abstractive summarization models generate text in a free-form manner.
We study the entropy, or uncertainty, of the model's token-level predictions.
We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.
arXiv Detail & Related papers (2020-10-15T16:57:27Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z) - Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis.
We obtain new explanations that are loosely necessary and sufficient for a prediction.
We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.