Provably Stable Feature Rankings with SHAP and LIME
- URL: http://arxiv.org/abs/2401.15800v2
- Date: Mon, 3 Jun 2024 00:49:43 GMT
- Title: Provably Stable Feature Rankings with SHAP and LIME
- Authors: Jeremy Goldwasser, Giles Hooker,
- Abstract summary: We devise attribution methods that ensure the most important features are ranked correctly with high probability.
We introduce efficient sampling algorithms for SHAP and LIME that guarantee the $K$ highest-ranked features have the proper ordering.
- Score: 3.8642937395065124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature attributions are ubiquitous tools for understanding the predictions of machine learning models. However, the calculation of popular methods for scoring input variables such as SHAP and LIME suffers from high instability due to random sampling. Leveraging ideas from multiple hypothesis testing, we devise attribution methods that ensure the most important features are ranked correctly with high probability. Given SHAP estimates from KernelSHAP or Shapley Sampling, we demonstrate how to retrospectively verify the number of stable rankings. Further, we introduce efficient sampling algorithms for SHAP and LIME that guarantee the $K$ highest-ranked features have the proper ordering. Finally, we show how to adapt these local feature attribution methods for the global importance setting.
Related papers
- The Distributional Uncertainty of the SHAP score in Explainable Machine
Learning [2.8136734847819778]
We propose a principled framework for reasoning on SHAP scores under unknown entity population distributions.
We study the basic problems of finding maxima and minima of this function, which allows us to determine tight ranges for the SHAP scores of all features.
arXiv Detail & Related papers (2024-01-23T13:04:02Z) - Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques.
In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets.
Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z) - Boosting Fair Classifier Generalization through Adaptive Priority Reweighing [59.801444556074394]
A performance-promising fair algorithm with better generalizability is needed.
This paper proposes a novel adaptive reweighing method to eliminate the impact of the distribution shifts between training and test data on model generalizability.
arXiv Detail & Related papers (2023-09-15T13:04:55Z) - SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification
of Top-k Features [16.99004256148679]
We introduce the Top-k Identification Problem (TkIP), where the objective is to identify the k features with the highest SHAP values.
The goal of our work is to improve the sample efficiency of existing methods in the context of solving TkIP.
arXiv Detail & Related papers (2023-07-10T18:42:45Z) - Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline.
$nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - An Imprecise SHAP as a Tool for Explaining the Class Probability
Distributions under Limited Training Data [5.8010446129208155]
An imprecise SHAP is proposed for cases when the class probability distributions are imprecise and represented by sets of distributions.
The first idea behind the imprecise SHAP is a new approach for computing the marginal contribution of a feature.
The second idea is an attempt to consider a general approach to calculating and reducing interval-valued Shapley values.
arXiv Detail & Related papers (2021-06-16T20:30:26Z) - A Multilinear Sampling Algorithm to Estimate Shapley Values [4.771833920251869]
We propose a new sampling method based on a multilinear extension technique as applied in game theory.
Our method is applicable to any machine learning model, in particular for either multi-class classifications or regression problems.
arXiv Detail & Related papers (2020-10-22T21:47:16Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.