Related papers: Succinct Interaction-Aware Explanations

Succinct Interaction-Aware Explanations

URL: http://arxiv.org/abs/2402.05566v2
Date: Fri, 19 Apr 2024 13:47:20 GMT
Title: Succinct Interaction-Aware Explanations
Authors: Sascha Xu, Joscha Cüppers, Jilles Vreeken,
Abstract summary: SHAP is a popular approach to explain black-box models by revealing the importance of individual features. NSHAP, on the other hand, reports the additive importance for all subsets of features. We propose to combine the best of these two worlds, by partitioning the features into parts that significantly interact.
Score: 33.25637826682827
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: SHAP is a popular approach to explain black-box models by revealing the importance of individual features. As it ignores feature interactions, SHAP explanations can be confusing up to misleading. NSHAP, on the other hand, reports the additive importance for all subsets of features. While this does include all interacting sets of features, it also leads to an exponentially sized, difficult to interpret explanation. In this paper, we propose to combine the best of these two worlds, by partitioning the features into parts that significantly interact, and use these parts to compose a succinct, interpretable, additive explanation. We derive a criterion by which to measure the representativeness of such a partition for a models behavior, traded off against the complexity of the resulting explanation. To efficiently find the best partition out of super-exponentially many, we show how to prune sub-optimal solutions using a statistical test, which not only improves runtime but also helps to detect spurious interactions. Experiments on synthetic and real world data show that our explanations are both more accurate resp. more easily interpretable than those of SHAP and NSHAP.

Related papers

Disentangling Interactions and Dependencies in Feature Attribution [9.442326245744916]
In machine learning, global feature importance methods try to determine how much each individual feature contributes to predicting a target variable. In commonly used feature importance scores these cooperative effects are conflated with the features' individual contributions. We derive DIP, a new mathematical decomposition of individual feature importance scores that disentangles three components.
arXiv Detail & Related papers (2024-10-31T09:41:10Z)
Unifying local and global model explanations by functional decomposition of low dimensional structures [0.0]
We consider a global explanation of a regression or classification function by decomposing it into the sum of main components and interaction components. Here, q denotes the highest order of interaction present in the decomposition.
arXiv Detail & Related papers (2022-08-12T07:38:53Z)
Partial Order in Chaos: Consensus on Feature Attributions in the Rashomon Set [50.67431815647126]
Post-hoc global/local feature attribution methods are being progressively employed to understand machine learning models. We show that partial orders of local/global feature importance arise from this methodology. We show that every relation among features present in these partial orders also holds in the rankings provided by existing approaches.
arXiv Detail & Related papers (2021-10-26T02:53:14Z)
Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals [72.00815192668193]
Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time. We study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation.
arXiv Detail & Related papers (2021-06-01T20:36:48Z)
Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models. Our method is based on projecting model representation to a latent space. Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z)
$PredDiff$: Explanations and Interactions from Conditional Expectations [0.3655021726150368]
$PredDiff$ is a model-agnostic, local attribution method rooted in probability theory. In this work, we clarify properties of $PredDiff$ and put forward several extensions of the original formalism.
arXiv Detail & Related papers (2021-02-26T14:46:47Z)
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies [88.0813215220342]
Some modalities can more easily contribute to the classification results than others. We develop a method based on the log-Sobolev inequality, which bounds the functional entropy with the functional-Fisher-information. On the two challenging multi-modal datasets VQA-CPv2 and SocialIQ, we obtain state-of-the-art results while more uniformly exploiting the modalities.
arXiv Detail & Related papers (2020-10-21T07:40:33Z)
The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models. We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z)
Evaluating and Aggregating Feature-based Model Explanations [27.677158604772238]
A feature-based model explanation denotes how much each input feature contributes to a model's output for a given data point. This paper proposes quantitative evaluation criteria for feature-based explanations: low sensitivity, high faithfulness, and low complexity.
arXiv Detail & Related papers (2020-05-01T21:56:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.