Explaining the data or explaining a model? Shapley values that uncover
non-linear dependencies
- URL: http://arxiv.org/abs/2007.06011v4
- Date: Sat, 6 Mar 2021 05:46:11 GMT
- Title: Explaining the data or explaining a model? Shapley values that uncover
non-linear dependencies
- Authors: Daniel Vidali Fryer, Inga Str\"umke, Hien Nguyen
- Abstract summary: We introduce and demonstrate the use of the energy distance correlations, affine-invariant distance correlation, and Hilbert-Shmidt independence criterion as Shapley value characteristic functions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Shapley values have become increasingly popular in the machine learning
literature thanks to their attractive axiomatisation, flexibility, and
uniqueness in satisfying certain notions of `fairness'. The flexibility arises
from the myriad potential forms of the Shapley value \textit{game formulation}.
Amongst the consequences of this flexibility is that there are now many types
of Shapley values being discussed, with such variety being a source of
potential misunderstanding. To the best of our knowledge, all existing game
formulations in the machine learning and statistics literature fall into a
category which we name the model-dependent category of game formulations. In
this work, we consider an alternative and novel formulation which leads to the
first instance of what we call model-independent Shapley values. These Shapley
values use a (non-parametric) measure of non-linear dependence as the
characteristic function. The strength of these Shapley values is in their
ability to uncover and attribute non-linear dependencies amongst features. We
introduce and demonstrate the use of the energy distance correlations,
affine-invariant distance correlation, and Hilbert-Shmidt independence
criterion as Shapley value characteristic functions. In particular, we
demonstrate their potential value for exploratory data analysis and model
diagnostics. We conclude with an interesting expository application to a
classical medical survey data set.
Related papers
- Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations [56.78271181959529]
Generalized Additive Models (GAMs) can capture non-linear relationships between variables and targets, but they cannot capture intricate feature interactions.
We propose Shape Expressions Arithmetic ( SHAREs) that fuses GAM's flexible shape functions with the complex feature interactions found in mathematical expressions.
We also design a set of rules for constructing SHAREs that guarantee transparency of the found expressions beyond the standard constraints.
arXiv Detail & Related papers (2024-04-15T13:44:01Z) - Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques.
In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets.
Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z) - Efficient Shapley Values Estimation by Amortization for Text
Classification [66.7725354593271]
We develop an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations.
Experimental results on two text classification datasets demonstrate that our amortized model estimates Shapley Values accurately with up to 60 times speedup.
arXiv Detail & Related papers (2023-05-31T16:19:13Z) - Learning from few examples with nonlinear feature maps [68.8204255655161]
We explore the phenomenon and reveal key relationships between dimensionality of AI model's feature space, non-degeneracy of data distributions, and the model's generalisation capabilities.
The main thrust of our present analysis is on the influence of nonlinear feature transformations mapping original data into higher- and possibly infinite-dimensional spaces on the resulting model's generalisation capabilities.
arXiv Detail & Related papers (2022-03-31T10:36:50Z) - Faith-Shap: The Faithful Shapley Interaction Index [43.968337274203414]
A key attraction of Shapley values is that they uniquely satisfy a very natural set of axiomatic properties.
We show that by requiring the faithful interaction indices to satisfy interaction-extensions of the standard individual Shapley axioms, we obtain a unique Faithful Shapley Interaction index.
arXiv Detail & Related papers (2022-03-02T04:44:52Z) - Exact Shapley Values for Local and Model-True Explanations of Decision
Tree Ensembles [0.0]
We consider the application of Shapley values for explaining decision tree ensembles.
We present a novel approach to Shapley value-based feature attribution that can be applied to random forests and boosted decision trees.
arXiv Detail & Related papers (2021-12-16T20:16:02Z) - Explaining predictive models using Shapley values and non-parametric
vine copulas [2.6774008509840996]
We propose two new approaches for modelling the dependence between the features.
The performance of the proposed methods is evaluated on simulated data sets and a real data set.
Experiments demonstrate that the vine copula approaches give more accurate approximations to the true Shapley values than its competitors.
arXiv Detail & Related papers (2021-02-12T09:43:28Z) - Multicollinearity Correction and Combined Feature Effect in Shapley
Values [0.0]
Shapley values represent the importance of a feature for a particular row.
We present a unified framework to calculate Shapley values with correlated features.
arXiv Detail & Related papers (2020-11-03T12:28:42Z) - Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual
Predictions of Complex Models [6.423239719448169]
Shapley values are designed to attribute the difference between a model's prediction and an average baseline to the different features used as input to the model.
We show how these 'causal' Shapley values can be derived for general causal graphs without sacrificing any of their desirable properties.
arXiv Detail & Related papers (2020-11-03T11:11:36Z) - Predictive and Causal Implications of using Shapley Value for Model
Interpretation [6.744385328015561]
We established the relationship between Shapley value and conditional independence, a key concept in both predictive and causal modeling.
Our results indicate that, eliminating a variable with high Shapley value from a model do not necessarily impair predictive performance.
More importantly, Shapley value of a variable do not reflect their causal relationship with the target of interest.
arXiv Detail & Related papers (2020-08-12T01:08:08Z) - Towards Efficient Data Valuation Based on the Shapley Value [65.4167993220998]
We study the problem of data valuation by utilizing the Shapley value.
The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value.
We propose a repertoire of efficient algorithms for approximating the Shapley value.
arXiv Detail & Related papers (2019-02-27T00:22:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.