Absolute Shapley Value
- URL: http://arxiv.org/abs/2003.10076v1
- Date: Mon, 23 Mar 2020 04:26:30 GMT
- Title: Absolute Shapley Value
- Authors: Jinfei Liu
- Abstract summary: In cooperative game theory, the marginal contribution of each contributor to each coalition is a nonnegative value.
In machine learning model training, the contribution of each contributor (data) to each coalition can be a negative value.
In this paper, we investigate the problem of how to handle the negative marginal contribution when computing Shapley value.
- Score: 2.0711789781518752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Shapley value is a concept in cooperative game theory for measuring the
contribution of each participant, which was named in honor of Lloyd Shapley.
Shapley value has been recently applied in data marketplaces for compensation
allocation based on their contribution to the models. Shapley value is the only
value division scheme used for compensation allocation that meets three
desirable criteria: group rationality, fairness, and additivity. In cooperative
game theory, the marginal contribution of each contributor to each coalition is
a nonnegative value. However, in machine learning model training, the marginal
contribution of each contributor (data tuple) to each coalition (a set of data
tuples) can be a negative value, i.e., the accuracy of the model trained by a
dataset with an additional data tuple can be lower than the accuracy of the
model trained by the dataset only.
In this paper, we investigate the problem of how to handle the negative
marginal contribution when computing Shapley value. We explore three
philosophies: 1) taking the original value (Original Shapley Value); 2) taking
the larger of the original value and zero (Zero Shapley Value); and 3) taking
the absolute value of the original value (Absolute Shapley Value). Experiments
on Iris dataset demonstrate that the definition of Absolute Shapley Value
significantly outperforms the other two definitions in terms of evaluating data
importance (the contribution of each data tuple to the trained model).
Related papers
- Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts.
Existing approaches require re-training models on different data subsets, which is computationally intensive.
This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z) - Shapley Value on Probabilistic Classifiers [6.163093930860032]
In the context of machine learning (ML), data valuation methods aim to equitably measure the contribution of each data point to the utility of an ML model.
Traditional Shapley-based data valuation methods may not effectively distinguish between beneficial and detrimental training data points.
We propose Probabilistic Shapley (P-Shapley) value by constructing a probability-wise utility function.
arXiv Detail & Related papers (2023-06-12T15:09:13Z) - Efficient Shapley Values Estimation by Amortization for Text
Classification [66.7725354593271]
We develop an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations.
Experimental results on two text classification datasets demonstrate that our amortized model estimates Shapley Values accurately with up to 60 times speedup.
arXiv Detail & Related papers (2023-05-31T16:19:13Z) - Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression.
It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise.
This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z) - On Generalization in Coreference Resolution [66.05112218880907]
We consolidate a set of 8 coreference resolution datasets targeting different domains to evaluate the off-the-shelf performance of models.
We then mix three datasets for training; even though their domain, annotation guidelines, and metadata differ, we propose a method for jointly training a single model.
We find that in a zero-shot setting, models trained on a single dataset transfer poorly while joint training yields improved overall performance.
arXiv Detail & Related papers (2021-09-20T16:33:22Z) - Joint Shapley values: a measure of joint feature importance [6.169364905804678]
We introduce joint Shapley values, which directly extend the Shapley axioms.
Joint Shapley values measure a set of features' average effect on a model's prediction.
Results for games show that joint Shapley values present different insights from existing interaction indices.
arXiv Detail & Related papers (2021-07-23T17:22:37Z) - The Shapley Value of Classifiers in Ensemble Games [7.23389716633927]
We introduce a new class of transferable utility cooperative games to answer this question.
The players in ensemble games are pre-trained binary classifiers that collaborate in an ensemble to correctly label points from a dataset.
We design Troupe a scalable algorithm that designates payoffs to individual models based on the Shapley value of those in the ensemble game.
arXiv Detail & Related papers (2021-01-06T17:40:23Z) - On the Importance of Adaptive Data Collection for Extremely Imbalanced
Pairwise Tasks [94.23884467360521]
We show that state-of-the art models trained on QQP and WikiQA each have only $2.4%$ average precision when evaluated on realistically imbalanced test data.
By creating balanced training data with more informative negative examples, active learning greatly improves average precision to $32.5%$ on QQP and $20.1%$ on WikiQA.
arXiv Detail & Related papers (2020-10-10T21:56:27Z) - Predictive and Causal Implications of using Shapley Value for Model
Interpretation [6.744385328015561]
We established the relationship between Shapley value and conditional independence, a key concept in both predictive and causal modeling.
Our results indicate that, eliminating a variable with high Shapley value from a model do not necessarily impair predictive performance.
More importantly, Shapley value of a variable do not reflect their causal relationship with the target of interest.
arXiv Detail & Related papers (2020-08-12T01:08:08Z) - Towards Efficient Data Valuation Based on the Shapley Value [65.4167993220998]
We study the problem of data valuation by utilizing the Shapley value.
The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value.
We propose a repertoire of efficient algorithms for approximating the Shapley value.
arXiv Detail & Related papers (2019-02-27T00:22:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.