WeightedSHAP: analyzing and improving Shapley based feature attributions
- URL: http://arxiv.org/abs/2209.13429v1
- Date: Tue, 27 Sep 2022 14:34:07 GMT
- Title: WeightedSHAP: analyzing and improving Shapley based feature attributions
- Authors: Yongchan Kwon, James Zou
- Abstract summary: Shapley value is a popular approach for measuring the influence of individual features.
We propose WeightedSHAP, which generalizes the Shapley value and learns which marginal contributions to focus directly from data.
On several real-world datasets, we demonstrate that the influential features identified by WeightedSHAP are better able to recapitulate the model's predictions.
- Score: 17.340091573913316
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Shapley value is a popular approach for measuring the influence of individual
features. While Shapley feature attribution is built upon desiderata from game
theory, some of its constraints may be less natural in certain machine learning
settings, leading to unintuitive model interpretation. In particular, the
Shapley value uses the same weight for all marginal contributions -- i.e. it
gives the same importance when a large number of other features are given
versus when a small number of other features are given. This property can be
problematic if larger feature sets are more or less informative than smaller
feature sets. Our work performs a rigorous analysis of the potential
limitations of Shapley feature attribution. We identify simple settings where
the Shapley value is mathematically suboptimal by assigning larger attributions
for less influential features. Motivated by this observation, we propose
WeightedSHAP, which generalizes the Shapley value and learns which marginal
contributions to focus directly from data. On several real-world datasets, we
demonstrate that the influential features identified by WeightedSHAP are better
able to recapitulate the model's predictions compared to the features
identified by the Shapley value.
Related papers
- Improving the Sampling Strategy in KernelSHAP [0.8057006406834466]
KernelSHAP framework enables us to approximate the Shapley values using a sampled subset of weighted conditional expectations.
We propose three main novel contributions: a stabilizing technique to reduce the variance of the weights in the current state-of-the-art strategy, a novel weighing scheme that corrects the Shapley kernel weights based on sampled subsets, and a straightforward strategy that includes the important subsets and integrates them with the corrected Shapley kernel weights.
arXiv Detail & Related papers (2024-10-07T10:02:31Z) - Scaling Laws for the Value of Individual Data Points in Machine Learning [55.596413470429475]
We introduce a new perspective by investigating scaling behavior for the value of individual data points.
We provide learning theory to support our scaling law, and we observe empirically that it holds across diverse model classes.
Our work represents a first step towards understanding and utilizing scaling properties for the value of individual data points.
arXiv Detail & Related papers (2024-05-30T20:10:24Z) - Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques.
In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets.
Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z) - Efficient Shapley Values Estimation by Amortization for Text
Classification [66.7725354593271]
We develop an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations.
Experimental results on two text classification datasets demonstrate that our amortized model estimates Shapley Values accurately with up to 60 times speedup.
arXiv Detail & Related papers (2023-05-31T16:19:13Z) - On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual
Recognition [65.67315418971688]
We show that truncating small eigenvalues of the Global Covariance Pooling (GCP) can attain smoother gradient.
On fine-grained datasets, truncating the small eigenvalues would make the model fail to converge.
Inspired by this observation, we propose a network branch dedicated to magnifying the importance of small eigenvalues.
arXiv Detail & Related papers (2022-05-26T11:41:36Z) - Joint Shapley values: a measure of joint feature importance [6.169364905804678]
We introduce joint Shapley values, which directly extend the Shapley axioms.
Joint Shapley values measure a set of features' average effect on a model's prediction.
Results for games show that joint Shapley values present different insights from existing interaction indices.
arXiv Detail & Related papers (2021-07-23T17:22:37Z) - groupShapley: Efficient prediction explanation with Shapley values for
feature groups [2.320417845168326]
Shapley values has established itself as one of the most appropriate and theoretically sound frameworks for explaining predictions from machine learning models.
The main drawback with Shapley values is that its computational complexity grows exponentially in the number of input features.
The present paper introduces groupShapley: a conceptually simple approach for dealing with the aforementioned bottlenecks.
arXiv Detail & Related papers (2021-06-23T08:16:14Z) - Fast Hierarchical Games for Image Explanations [78.16853337149871]
We present a model-agnostic explanation method for image classification based on a hierarchical extension of Shapley coefficients.
Unlike other Shapley-based explanation methods, h-Shap is scalable and can be computed without the need of approximation.
We compare our hierarchical approach with popular Shapley-based and non-Shapley-based methods on a synthetic dataset, a medical imaging scenario, and a general computer vision problem.
arXiv Detail & Related papers (2021-04-13T13:11:02Z) - Multicollinearity Correction and Combined Feature Effect in Shapley
Values [0.0]
Shapley values represent the importance of a feature for a particular row.
We present a unified framework to calculate Shapley values with correlated features.
arXiv Detail & Related papers (2020-11-03T12:28:42Z) - Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual
Predictions of Complex Models [6.423239719448169]
Shapley values are designed to attribute the difference between a model's prediction and an average baseline to the different features used as input to the model.
We show how these 'causal' Shapley values can be derived for general causal graphs without sacrificing any of their desirable properties.
arXiv Detail & Related papers (2020-11-03T11:11:36Z) - Towards Efficient Data Valuation Based on the Shapley Value [65.4167993220998]
We study the problem of data valuation by utilizing the Shapley value.
The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value.
We propose a repertoire of efficient algorithms for approximating the Shapley value.
arXiv Detail & Related papers (2019-02-27T00:22:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.