Related papers: Absolute Shapley Value

Absolute Shapley Value

URL: http://arxiv.org/abs/2003.10076v1
Date: Mon, 23 Mar 2020 04:26:30 GMT
Title: Absolute Shapley Value
Authors: Jinfei Liu
Abstract summary: In cooperative game theory, the marginal contribution of each contributor to each coalition is a nonnegative value. In machine learning model training, the contribution of each contributor (data) to each coalition can be a negative value. In this paper, we investigate the problem of how to handle the negative marginal contribution when computing Shapley value.
Score: 2.0711789781518752
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Shapley value is a concept in cooperative game theory for measuring the contribution of each participant, which was named in honor of Lloyd Shapley. Shapley value has been recently applied in data marketplaces for compensation allocation based on their contribution to the models. Shapley value is the only value division scheme used for compensation allocation that meets three desirable criteria: group rationality, fairness, and additivity. In cooperative game theory, the marginal contribution of each contributor to each coalition is a nonnegative value. However, in machine learning model training, the marginal contribution of each contributor (data tuple) to each coalition (a set of data tuples) can be a negative value, i.e., the accuracy of the model trained by a dataset with an additional data tuple can be lower than the accuracy of the model trained by the dataset only. In this paper, we investigate the problem of how to handle the negative marginal contribution when computing Shapley value. We explore three philosophies: 1) taking the original value (Original Shapley Value); 2) taking the larger of the original value and zero (Zero Shapley Value); and 3) taking the absolute value of the original value (Absolute Shapley Value). Experiments on Iris dataset demonstrate that the definition of Absolute Shapley Value significantly outperforms the other two definitions in terms of evaluating data importance (the contribution of each data tuple to the trained model).

Related papers

FW-Shapley: Real-time Estimation of Weighted Shapley Values [21.562508939780532]
We present Fast Weighted Shapley, an amortized framework for efficiently computing weighted Shapley values. We also show that our estimator's training procedure is theoretically valid even though we do not use ground truth weighted Shapley values during training. For data valuation, we are much faster (14 times) while being comparable to the state-of-the-art KNN Shapley.
arXiv Detail & Related papers (2025-03-09T13:13:14Z)
DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets. Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining. Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z)
Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts. Existing approaches require re-training models on different data subsets, which is computationally intensive. This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z)
Shapley Value on Probabilistic Classifiers [6.163093930860032]
In the context of machine learning (ML), data valuation methods aim to equitably measure the contribution of each data point to the utility of an ML model. Traditional Shapley-based data valuation methods may not effectively distinguish between beneficial and detrimental training data points. We propose Probabilistic Shapley (P-Shapley) value by constructing a probability-wise utility function.
arXiv Detail & Related papers (2023-06-12T15:09:13Z)
Efficient Shapley Values Estimation by Amortization for Text Classification [66.7725354593271]
We develop an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations. Experimental results on two text classification datasets demonstrate that our amortized model estimates Shapley Values accurately with up to 60 times speedup.
arXiv Detail & Related papers (2023-05-31T16:19:13Z)
Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression. It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise. This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z)
On Generalization in Coreference Resolution [66.05112218880907]
We consolidate a set of 8 coreference resolution datasets targeting different domains to evaluate the off-the-shelf performance of models. We then mix three datasets for training; even though their domain, annotation guidelines, and metadata differ, we propose a method for jointly training a single model. We find that in a zero-shot setting, models trained on a single dataset transfer poorly while joint training yields improved overall performance.
arXiv Detail & Related papers (2021-09-20T16:33:22Z)
Joint Shapley values: a measure of joint feature importance [6.169364905804678]
We introduce joint Shapley values, which directly extend the Shapley axioms. Joint Shapley values measure a set of features' average effect on a model's prediction. Results for games show that joint Shapley values present different insights from existing interaction indices.
arXiv Detail & Related papers (2021-07-23T17:22:37Z)
The Shapley Value of Classifiers in Ensemble Games [7.23389716633927]
We introduce a new class of transferable utility cooperative games to answer this question. The players in ensemble games are pre-trained binary classifiers that collaborate in an ensemble to correctly label points from a dataset. We design Troupe a scalable algorithm that designates payoffs to individual models based on the Shapley value of those in the ensemble game.
arXiv Detail & Related papers (2021-01-06T17:40:23Z)
On the Importance of Adaptive Data Collection for Extremely Imbalanced Pairwise Tasks [94.23884467360521]
We show that state-of-the art models trained on QQP and WikiQA each have only $2.4%$ average precision when evaluated on realistically imbalanced test data. By creating balanced training data with more informative negative examples, active learning greatly improves average precision to $32.5%$ on QQP and $20.1%$ on WikiQA.
arXiv Detail & Related papers (2020-10-10T21:56:27Z)
Predictive and Causal Implications of using Shapley Value for Model Interpretation [6.744385328015561]
We established the relationship between Shapley value and conditional independence, a key concept in both predictive and causal modeling. Our results indicate that, eliminating a variable with high Shapley value from a model do not necessarily impair predictive performance. More importantly, Shapley value of a variable do not reflect their causal relationship with the target of interest.
arXiv Detail & Related papers (2020-08-12T01:08:08Z)
Towards Efficient Data Valuation Based on the Shapley Value [65.4167993220998]
We study the problem of data valuation by utilizing the Shapley value. The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value. We propose a repertoire of efficient algorithms for approximating the Shapley value.
arXiv Detail & Related papers (2019-02-27T00:22:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.