Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for
Machine Learning
- URL: http://arxiv.org/abs/2110.14049v1
- Date: Tue, 26 Oct 2021 22:03:55 GMT
- Title: Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for
Machine Learning
- Authors: Yongchan Kwon, James Zou
- Abstract summary: We propose Beta Shapley, which is a substantial generalization of Data Shapley.
Beta Shapley unifies several popular data valuation methods and includes data Shapley as a special case.
We demonstrate that Beta Shapley outperforms state-of-the-art data valuation methods on several downstream ML tasks.
- Score: 13.66570363867102
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data Shapley has recently been proposed as a principled framework to quantify
the contribution of individual datum in machine learning. It can effectively
identify helpful or harmful data points for a learning algorithm. In this
paper, we propose Beta Shapley, which is a substantial generalization of Data
Shapley. Beta Shapley arises naturally by relaxing the efficiency axiom of the
Shapley value, which is not critical for machine learning settings. Beta
Shapley unifies several popular data valuation methods and includes data
Shapley as a special case. Moreover, we prove that Beta Shapley has several
desirable statistical properties and propose efficient algorithms to estimate
it. We demonstrate that Beta Shapley outperforms state-of-the-art data
valuation methods on several downstream ML tasks such as: 1) detecting
mislabeled training data; 2) learning with subsamples; and 3) identifying
points whose addition or removal have the largest positive or negative impact
on the model.
Related papers
- Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts.
Existing approaches require re-training models on different data subsets, which is computationally intensive.
This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z) - Accelerated Shapley Value Approximation for Data Evaluation [3.707457963532597]
We show that Shapley value of data points can be approximated more efficiently by leveraging structural properties of machine learning problems.
Our analysis suggests that in fact models trained on small subsets are more important in context of data valuation.
arXiv Detail & Related papers (2023-11-09T13:15:36Z) - Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques.
In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets.
Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z) - An Efficient Shapley Value Computation for the Naive Bayes Classifier [0.0]
This article proposes an exact analytic expression of Shapley values in the case of the naive Bayes classifier.
Results show that our Shapley proposal for the naive Bayes provides informative results with low algorithmic complexity.
arXiv Detail & Related papers (2023-07-31T14:39:10Z) - Shapley Value on Probabilistic Classifiers [6.163093930860032]
In the context of machine learning (ML), data valuation methods aim to equitably measure the contribution of each data point to the utility of an ML model.
Traditional Shapley-based data valuation methods may not effectively distinguish between beneficial and detrimental training data points.
We propose Probabilistic Shapley (P-Shapley) value by constructing a probability-wise utility function.
arXiv Detail & Related papers (2023-06-12T15:09:13Z) - Efficient Shapley Values Estimation by Amortization for Text
Classification [66.7725354593271]
We develop an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations.
Experimental results on two text classification datasets demonstrate that our amortized model estimates Shapley Values accurately with up to 60 times speedup.
arXiv Detail & Related papers (2023-05-31T16:19:13Z) - CS-Shapley: Class-wise Shapley Values for Data Valuation in
Classification [24.44357623723746]
We propose CS-Shapley, a Shapley value with a new value function that discriminates between training instances' in-class and out-of-class contributions.
Our results suggest Shapley-based data valuation is transferable for application across different models.
arXiv Detail & Related papers (2022-11-13T03:32:33Z) - Fast Hierarchical Games for Image Explanations [78.16853337149871]
We present a model-agnostic explanation method for image classification based on a hierarchical extension of Shapley coefficients.
Unlike other Shapley-based explanation methods, h-Shap is scalable and can be computed without the need of approximation.
We compare our hierarchical approach with popular Shapley-based and non-Shapley-based methods on a synthetic dataset, a medical imaging scenario, and a general computer vision problem.
arXiv Detail & Related papers (2021-04-13T13:11:02Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - A Multilinear Sampling Algorithm to Estimate Shapley Values [4.771833920251869]
We propose a new sampling method based on a multilinear extension technique as applied in game theory.
Our method is applicable to any machine learning model, in particular for either multi-class classifications or regression problems.
arXiv Detail & Related papers (2020-10-22T21:47:16Z) - Towards Efficient Data Valuation Based on the Shapley Value [65.4167993220998]
We study the problem of data valuation by utilizing the Shapley value.
The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value.
We propose a repertoire of efficient algorithms for approximating the Shapley value.
arXiv Detail & Related papers (2019-02-27T00:22:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.