Related papers: Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

URL: http://arxiv.org/abs/2212.14855v3
Date: Mon, 15 Apr 2024 08:24:42 GMT
Title: Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces
Authors: Pattarawat Chormai, Jan Herrmann, Klaus-Robert Müller, Grégoire Montavon,
Abstract summary: Explainable AI aims to overcome the black-box nature of complex ML models like neural networks by generating explanations for their predictions. We propose two new analyses, extending principles found in PCA or ICA to explanations. These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), maximize relevance instead of e.g. variance or kurtosis.
Score: 14.70409833767752
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Explainable AI aims to overcome the black-box nature of complex ML models like neural networks by generating explanations for their predictions. Explanations often take the form of a heatmap identifying input features (e.g. pixels) that are relevant to the model's decision. These explanations, however, entangle the potentially multiple factors that enter into the overall complex decision strategy. We propose to disentangle explanations by extracting at some intermediate layer of a neural network, subspaces that capture the multiple and distinct activation patterns (e.g. visual concepts) that are relevant to the prediction. To automatically extract these subspaces, we propose two new analyses, extending principles found in PCA or ICA to explanations. These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), maximize relevance instead of e.g. variance or kurtosis. This allows for a much stronger focus of the analysis on what the ML model actually uses for predicting, ignoring activations or concepts to which the model is invariant. Our approach is general enough to work alongside common attribution techniques such as Shapley Value, Integrated Gradients, or LRP. Our proposed methods show to be practically useful and compare favorably to the state of the art as demonstrated on benchmarks and three use cases.

Related papers

Formal Abductive Latent Explanations for Prototype-Based Networks [7.001970497421476]
Case-based reasoning networks make predictions based on similarity between the input and prototypical parts of training samples, called prototypes.<n>We show that such explanations are sometimes misleading, which hampers their usefulness in safety-critical contexts.<n>We propose Abductive Latent Explanations (ALEs), a formalism to express sufficient conditions on the intermediate representation of the instance that imply the prediction.
arXiv Detail & Related papers (2025-11-20T17:42:41Z)
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z)
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Efficient Contrastive Explanations on Demand [7.109897082275965]
This paper proposes novel algorithms to compute the so-called contrastive explanations for machine learning models. The paper also proposes novel algorithms for listing explanations and finding smallest contrastive explanations.
arXiv Detail & Related papers (2024-12-24T08:24:10Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
How Well Do Feature-Additive Explainers Explain Feature-Additive Predictors? [12.993027779814478]
We ask the question: can popular feature-additive explainers (e.g., LIME, SHAP, SHAPR, MAPLE, and PDP) explain feature-additive predictors? Herein, we evaluate such explainers on ground truth that is analytically derived from the additive structure of a model. Our results suggest that all explainers eventually fail to correctly attribute the importance of features, especially when a decision-making process involves feature interactions.
arXiv Detail & Related papers (2023-10-27T21:16:28Z)
Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z)
Local Interpretable Model Agnostic Shap Explanations for machine learning models [0.0]
We propose a methodology that we define as Local Interpretable Model Agnostic Shap Explanations (LIMASE) This proposed technique uses Shapley values under the LIME paradigm to achieve the following (a) explain prediction of any model by using a locally faithful and interpretable decision tree model on which the Tree Explainer is used to calculate the shapley values and give visually interpretable explanations.
arXiv Detail & Related papers (2022-10-10T10:07:27Z)
Explanation Method for Anomaly Detection on Mixed Numerical and Categorical Spaces [0.9543943371833464]
We present EADMNC (Explainable Anomaly Detection on Mixed Numerical and Categorical spaces) It adds explainability to the predictions obtained with the original model. We report experimental results on extensive real-world data, particularly in the domain of network intrusion detection.
arXiv Detail & Related papers (2022-09-09T08:20:13Z)
This looks more like that: Enhancing Self-Explaining Models by Prototypical Relevance Propagation [17.485732906337507]
We present a case study of the self-explaining network, ProtoPNet, in the presence of a spectrum of artifacts. We introduce a novel method for generating more precise model-aware explanations. In order to obtain a clean dataset, we propose to use multi-view clustering strategies for segregating the artifact images.
arXiv Detail & Related papers (2021-08-27T09:55:53Z)
Boundary Attributions Provide Normal (Vector) Explanations [27.20904776964045]
Boundary Attribution (BA) is a new explanation method to address this question. BA involves computing normal vectors of the local decision boundaries for the target input. We prove two theorems for ReLU networks: BA of randomized smoothed networks or robustly trained networks is much closer to non-boundary attribution methods than that in standard networks.
arXiv Detail & Related papers (2021-03-20T22:36:39Z)
Toward Scalable and Unified Example-based Explanation and Outlier Detection [128.23117182137418]
We argue for a broader adoption of prototype-based student networks capable of providing an example-based explanation for their prediction. We show that our prototype-based networks beyond similarity kernels deliver meaningful explanations and promising outlier detection results without compromising classification accuracy.
arXiv Detail & Related papers (2020-11-11T05:58:17Z)
Understanding Neural Abstractive Summarization Models via Uncertainty [54.37665950633147]
seq2seq abstractive summarization models generate text in a free-form manner. We study the entropy, or uncertainty, of the model's token-level predictions. We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.
arXiv Detail & Related papers (2020-10-15T16:57:27Z)
The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models. We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z)
Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis. We obtain new explanations that are loosely necessary and sufficient for a prediction. We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.