Related papers: An Efficient Shapley Value Computation for the Naive Bayes Classifier

An Efficient Shapley Value Computation for the Naive Bayes Classifier

URL: http://arxiv.org/abs/2307.16718v1
Date: Mon, 31 Jul 2023 14:39:10 GMT
Title: An Efficient Shapley Value Computation for the Naive Bayes Classifier
Authors: Vincent Lemaire, Fabrice Cl\'erot and Marc Boull\'e
Abstract summary: This article proposes an exact analytic expression of Shapley values in the case of the naive Bayes classifier. Results show that our Shapley proposal for the naive Bayes provides informative results with low algorithmic complexity.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Variable selection or importance measurement of input variables to a machine learning model has become the focus of much research. It is no longer enough to have a good model, one also must explain its decisions. This is why there are so many intelligibility algorithms available today. Among them, Shapley value estimation algorithms are intelligibility methods based on cooperative game theory. In the case of the naive Bayes classifier, and to our knowledge, there is no ``analytical" formulation of Shapley values. This article proposes an exact analytic expression of Shapley values in the special case of the naive Bayes Classifier. We analytically compare this Shapley proposal, to another frequently used indicator, the Weight of Evidence (WoE) and provide an empirical comparison of our proposal with (i) the WoE and (ii) KernelShap results on real world datasets, discussing similar and dissimilar results. The results show that our Shapley proposal for the naive Bayes classifier provides informative results with low algorithmic complexity so that it can be used on very large datasets with extremely low computation time.

Related papers

DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets. Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining. Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z)
Accelerated Shapley Value Approximation for Data Evaluation [3.707457963532597]
We show that Shapley value of data points can be approximated more efficiently by leveraging structural properties of machine learning problems. Our analysis suggests that in fact models trained on small subsets are more important in context of data valuation.
arXiv Detail & Related papers (2023-11-09T13:15:36Z)
Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques. In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets. Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z)
Efficient Shapley Values Estimation by Amortization for Text Classification [66.7725354593271]
We develop an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations. Experimental results on two text classification datasets demonstrate that our amortized model estimates Shapley Values accurately with up to 60 times speedup.
arXiv Detail & Related papers (2023-05-31T16:19:13Z)
PDD-SHAP: Fast Approximations for Shapley Values using Functional Decomposition [2.0559497209595823]
We propose PDD-SHAP, an algorithm that uses an ANOVA-based functional decomposition model to approximate the black-box model being explained. This allows us to calculate Shapley values orders of magnitude faster than existing methods for large datasets, significantly reducing the amortized cost of computing Shapley values.
arXiv Detail & Related papers (2022-08-26T11:49:54Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z)
Evaluating State-of-the-Art Classification Models Against Bayes Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows. We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z)
The Shapley Value of Classifiers in Ensemble Games [7.23389716633927]
We introduce a new class of transferable utility cooperative games to answer this question. The players in ensemble games are pre-trained binary classifiers that collaborate in an ensemble to correctly label points from a dataset. We design Troupe a scalable algorithm that designates payoffs to individual models based on the Shapley value of those in the ensemble game.
arXiv Detail & Related papers (2021-01-06T17:40:23Z)
A Multilinear Sampling Algorithm to Estimate Shapley Values [4.771833920251869]
We propose a new sampling method based on a multilinear extension technique as applied in game theory. Our method is applicable to any machine learning model, in particular for either multi-class classifications or regression problems.
arXiv Detail & Related papers (2020-10-22T21:47:16Z)
Towards Efficient Data Valuation Based on the Shapley Value [65.4167993220998]
We study the problem of data valuation by utilizing the Shapley value. The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value. We propose a repertoire of efficient algorithms for approximating the Shapley value.
arXiv Detail & Related papers (2019-02-27T00:22:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.