A Multilinear Sampling Algorithm to Estimate Shapley Values
- URL: http://arxiv.org/abs/2010.12082v1
- Date: Thu, 22 Oct 2020 21:47:16 GMT
- Title: A Multilinear Sampling Algorithm to Estimate Shapley Values
- Authors: Ramin Okhrati and Aldo Lipani
- Abstract summary: We propose a new sampling method based on a multilinear extension technique as applied in game theory.
Our method is applicable to any machine learning model, in particular for either multi-class classifications or regression problems.
- Score: 4.771833920251869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Shapley values are great analytical tools in game theory to measure the
importance of a player in a game. Due to their axiomatic and desirable
properties such as efficiency, they have become popular for feature importance
analysis in data science and machine learning. However, the time complexity to
compute Shapley values based on the original formula is exponential, and as the
number of features increases, this becomes infeasible. Castro et al. [1]
developed a sampling algorithm, to estimate Shapley values. In this work, we
propose a new sampling method based on a multilinear extension technique as
applied in game theory. The aim is to provide a more efficient (sampling)
method for estimating Shapley values. Our method is applicable to any machine
learning model, in particular for either multi-class classifications or
regression problems. We apply the method to estimate Shapley values for
multilayer perceptrons (MLPs) and through experimentation on two datasets, we
demonstrate that our method provides more accurate estimations of the Shapley
values by reducing the variance of the sampling statistics.
Related papers
- Improving the Sampling Strategy in KernelSHAP [0.8057006406834466]
KernelSHAP framework enables us to approximate the Shapley values using a sampled subset of weighted conditional expectations.
We propose three main novel contributions: a stabilizing technique to reduce the variance of the weights in the current state-of-the-art strategy, a novel weighing scheme that corrects the Shapley kernel weights based on sampled subsets, and a straightforward strategy that includes the important subsets and integrates them with the corrected Shapley kernel weights.
arXiv Detail & Related papers (2024-10-07T10:02:31Z) - Accelerated Shapley Value Approximation for Data Evaluation [3.707457963532597]
We show that Shapley value of data points can be approximated more efficiently by leveraging structural properties of machine learning problems.
Our analysis suggests that in fact models trained on small subsets are more important in context of data valuation.
arXiv Detail & Related papers (2023-11-09T13:15:36Z) - Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques.
In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets.
Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z) - An Efficient Shapley Value Computation for the Naive Bayes Classifier [0.0]
This article proposes an exact analytic expression of Shapley values in the case of the naive Bayes classifier.
Results show that our Shapley proposal for the naive Bayes provides informative results with low algorithmic complexity.
arXiv Detail & Related papers (2023-07-31T14:39:10Z) - Efficient Shapley Values Estimation by Amortization for Text
Classification [66.7725354593271]
We develop an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations.
Experimental results on two text classification datasets demonstrate that our amortized model estimates Shapley Values accurately with up to 60 times speedup.
arXiv Detail & Related papers (2023-05-31T16:19:13Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Efficient computation and analysis of distributional Shapley values [15.322542729755998]
We derive the first analytic expressions for DShapley for the canonical problems of linear regression, binary classification, and non-parametric density estimation.
Our formulas are directly interpretable and provide quantitative insights into how the value varies for different types of data.
arXiv Detail & Related papers (2020-07-02T19:51:54Z) - Towards Efficient Data Valuation Based on the Shapley Value [65.4167993220998]
We study the problem of data valuation by utilizing the Shapley value.
The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value.
We propose a repertoire of efficient algorithms for approximating the Shapley value.
arXiv Detail & Related papers (2019-02-27T00:22:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.