An Efficient Shapley Value Computation for the Naive Bayes Classifier
- URL: http://arxiv.org/abs/2307.16718v1
- Date: Mon, 31 Jul 2023 14:39:10 GMT
- Title: An Efficient Shapley Value Computation for the Naive Bayes Classifier
- Authors: Vincent Lemaire, Fabrice Cl\'erot and Marc Boull\'e
- Abstract summary: This article proposes an exact analytic expression of Shapley values in the case of the naive Bayes classifier.
Results show that our Shapley proposal for the naive Bayes provides informative results with low algorithmic complexity.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Variable selection or importance measurement of input variables to a machine
learning model has become the focus of much research. It is no longer enough to
have a good model, one also must explain its decisions. This is why there are
so many intelligibility algorithms available today. Among them, Shapley value
estimation algorithms are intelligibility methods based on cooperative game
theory. In the case of the naive Bayes classifier, and to our knowledge, there
is no ``analytical" formulation of Shapley values. This article proposes an
exact analytic expression of Shapley values in the special case of the naive
Bayes Classifier. We analytically compare this Shapley proposal, to another
frequently used indicator, the Weight of Evidence (WoE) and provide an
empirical comparison of our proposal with (i) the WoE and (ii) KernelShap
results on real world datasets, discussing similar and dissimilar results. The
results show that our Shapley proposal for the naive Bayes classifier provides
informative results with low algorithmic complexity so that it can be used on
very large datasets with extremely low computation time.
Related papers
- Accelerated Shapley Value Approximation for Data Evaluation [3.707457963532597]
We show that Shapley value of data points can be approximated more efficiently by leveraging structural properties of machine learning problems.
Our analysis suggests that in fact models trained on small subsets are more important in context of data valuation.
arXiv Detail & Related papers (2023-11-09T13:15:36Z) - Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques.
In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets.
Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z) - Efficient Shapley Values Estimation by Amortization for Text
Classification [66.7725354593271]
We develop an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations.
Experimental results on two text classification datasets demonstrate that our amortized model estimates Shapley Values accurately with up to 60 times speedup.
arXiv Detail & Related papers (2023-05-31T16:19:13Z) - PDD-SHAP: Fast Approximations for Shapley Values using Functional
Decomposition [2.0559497209595823]
We propose PDD-SHAP, an algorithm that uses an ANOVA-based functional decomposition model to approximate the black-box model being explained.
This allows us to calculate Shapley values orders of magnitude faster than existing methods for large datasets, significantly reducing the amortized cost of computing Shapley values.
arXiv Detail & Related papers (2022-08-26T11:49:54Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - The Shapley Value of Classifiers in Ensemble Games [7.23389716633927]
We introduce a new class of transferable utility cooperative games to answer this question.
The players in ensemble games are pre-trained binary classifiers that collaborate in an ensemble to correctly label points from a dataset.
We design Troupe a scalable algorithm that designates payoffs to individual models based on the Shapley value of those in the ensemble game.
arXiv Detail & Related papers (2021-01-06T17:40:23Z) - A Multilinear Sampling Algorithm to Estimate Shapley Values [4.771833920251869]
We propose a new sampling method based on a multilinear extension technique as applied in game theory.
Our method is applicable to any machine learning model, in particular for either multi-class classifications or regression problems.
arXiv Detail & Related papers (2020-10-22T21:47:16Z) - Towards Efficient Data Valuation Based on the Shapley Value [65.4167993220998]
We study the problem of data valuation by utilizing the Shapley value.
The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value.
We propose a repertoire of efficient algorithms for approximating the Shapley value.
arXiv Detail & Related papers (2019-02-27T00:22:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.