Related papers: Multicollinearity Correction and Combined Feature Effect in Shapley Values

Multicollinearity Correction and Combined Feature Effect in Shapley Values

URL: http://arxiv.org/abs/2011.01661v1
Date: Tue, 3 Nov 2020 12:28:42 GMT
Title: Multicollinearity Correction and Combined Feature Effect in Shapley Values
Authors: Indranil Basu and Subhadip Maji
Abstract summary: Shapley values represent the importance of a feature for a particular row. We present a unified framework to calculate Shapley values with correlated features.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model interpretability is one of the most intriguing problems in most of the Machine Learning models, particularly for those that are mathematically sophisticated. Computing Shapley Values are arguably the best approach so far to find the importance of each feature in a model, at the row level. In other words, Shapley values represent the importance of a feature for a particular row, especially for Classification or Regression problems. One of the biggest limitations of Shapley vales is that, Shapley value calculations assume all the features are uncorrelated (independent of each other), this assumption is often incorrect. To address this problem, we present a unified framework to calculate Shapley values with correlated features. To be more specific, we do an adjustment (Matrix formulation) of the features while calculating Independent Shapley values for the rows. Moreover, we have given a Mathematical proof against the said adjustments. With these adjustments, Shapley values (Importance) for the features become independent of the correlations existing between them. We have also enhanced this adjustment concept for more than features. As the Shapley values are additive, to calculate combined effect of two features, we just have to add their individual Shapley values. This is again not right if one or more of the features (used in the combination) are correlated with the other features (not in the combination). We have addressed this problem also by extending the correlation adjustment for one feature to multiple features in the said combination for which Shapley values are determined. Our implementation of this method proves that our method is computationally efficient also, compared to original Shapley method.

Related papers

Test Set Sizing for the Ridge Regression [55.2480439325792]
This is the first time that such a split is calculated mathematically for a machine learning model in the large data limit. The goal of the calculations is to maximize "integrity," so that the measured error in the trained model is as close as possible to what it theoretically should be.
arXiv Detail & Related papers (2025-04-27T13:17:18Z)
FW-Shapley: Real-time Estimation of Weighted Shapley Values [21.562508939780532]
We present Fast Weighted Shapley, an amortized framework for efficiently computing weighted Shapley values. We also show that our estimator's training procedure is theoretically valid even though we do not use ground truth weighted Shapley values during training. For data valuation, we are much faster (14 times) while being comparable to the state-of-the-art KNN Shapley.
arXiv Detail & Related papers (2025-03-09T13:13:14Z)
Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives. Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion. We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z)
Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques. In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets. Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z)
Shapley Sets: Feature Attribution via Recursive Function Decomposition [6.85316573653194]
We propose an alternative attribution approach, Shapley Sets, which awards value to sets of features. Shapley Sets decomposes the underlying model into non-separable variable groups. We show theoretically and experimentally how Shapley Sets avoids pitfalls associated with Shapley value based alternatives.
arXiv Detail & Related papers (2023-07-04T15:30:09Z)
Nonlinear Feature Aggregation: Two Algorithms driven by Theory [45.3190496371625]
Real-world machine learning applications are characterized by a huge number of features, leading to computational and memory issues. We propose a dimensionality reduction algorithm (NonLinCFA) which aggregates non-linear transformations of features with a generic aggregation function. We also test the algorithms on synthetic and real-world datasets, performing regression and classification tasks, showing competitive performances.
arXiv Detail & Related papers (2023-06-19T19:57:33Z)
Efficient Shapley Values Estimation by Amortization for Text Classification [66.7725354593271]
We develop an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations. Experimental results on two text classification datasets demonstrate that our amortized model estimates Shapley Values accurately with up to 60 times speedup.
arXiv Detail & Related papers (2023-05-31T16:19:13Z)
WeightedSHAP: analyzing and improving Shapley based feature attributions [17.340091573913316]
Shapley value is a popular approach for measuring the influence of individual features. We propose WeightedSHAP, which generalizes the Shapley value and learns which marginal contributions to focus directly from data. On several real-world datasets, we demonstrate that the influential features identified by WeightedSHAP are better able to recapitulate the model's predictions.
arXiv Detail & Related papers (2022-09-27T14:34:07Z)
Faith-Shap: The Faithful Shapley Interaction Index [43.968337274203414]
A key attraction of Shapley values is that they uniquely satisfy a very natural set of axiomatic properties. We show that by requiring the faithful interaction indices to satisfy interaction-extensions of the standard individual Shapley axioms, we obtain a unique Faithful Shapley Interaction index.
arXiv Detail & Related papers (2022-03-02T04:44:52Z)
Joint Shapley values: a measure of joint feature importance [6.169364905804678]
We introduce joint Shapley values, which directly extend the Shapley axioms. Joint Shapley values measure a set of features' average effect on a model's prediction. Results for games show that joint Shapley values present different insights from existing interaction indices.
arXiv Detail & Related papers (2021-07-23T17:22:37Z)
groupShapley: Efficient prediction explanation with Shapley values for feature groups [2.320417845168326]
Shapley values has established itself as one of the most appropriate and theoretically sound frameworks for explaining predictions from machine learning models. The main drawback with Shapley values is that its computational complexity grows exponentially in the number of input features. The present paper introduces groupShapley: a conceptually simple approach for dealing with the aforementioned bottlenecks.
arXiv Detail & Related papers (2021-06-23T08:16:14Z)
Explaining the data or explaining a model? Shapley values that uncover non-linear dependencies [0.0]
We introduce and demonstrate the use of the energy distance correlations, affine-invariant distance correlation, and Hilbert-Shmidt independence criterion as Shapley value characteristic functions.
arXiv Detail & Related papers (2020-07-12T15:04:59Z)
Towards Efficient Data Valuation Based on the Shapley Value [65.4167993220998]
We study the problem of data valuation by utilizing the Shapley value. The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value. We propose a repertoire of efficient algorithms for approximating the Shapley value.
arXiv Detail & Related papers (2019-02-27T00:22:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.