Antithetic Sampling for Top-k Shapley Identification
- URL: http://arxiv.org/abs/2504.02019v1
- Date: Wed, 02 Apr 2025 15:38:32 GMT
- Title: Antithetic Sampling for Top-k Shapley Identification
- Authors: Patrick Kolpaczki, Tim Nielen, Eyke Hüllermeier,
- Abstract summary: The Shapley value's popularity in and outside of explainable AI stems from its axiomatic uniqueness.<n>Most works investigate the uniform approximation of all features' Shapley values, needlessly consuming samples for insignificant features.<n>In contrast, identifying the $k$ most important features can already be sufficiently insightful and yields the potential to leverage algorithmic opportunities connected to the field of multi-armed bandits.
- Score: 19.221081896134567
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Additive feature explanations rely primarily on game-theoretic notions such as the Shapley value by viewing features as cooperating players. The Shapley value's popularity in and outside of explainable AI stems from its axiomatic uniqueness. However, its computational complexity severely limits practicability. Most works investigate the uniform approximation of all features' Shapley values, needlessly consuming samples for insignificant features. In contrast, identifying the $k$ most important features can already be sufficiently insightful and yields the potential to leverage algorithmic opportunities connected to the field of multi-armed bandits. We propose Comparable Marginal Contributions Sampling (CMCS), a method for the top-$k$ identification problem utilizing a new sampling scheme taking advantage of correlated observations. We conduct experiments to showcase the efficacy of our method in compared to competitive baselines. Our empirical findings reveal that estimation quality for the approximate-all problem does not necessarily transfer to top-$k$ identification and vice versa.
Related papers
- Shapley Value Approximation Based on k-Additive Games [21.023521613326654]
The Shapley value is the prevalent solution for fair division problems in which a payout is to be divided among multiple agents.
Despite its popularity and axiomatic justification, the Shapley value suffers from a computational complexity that scales exponentially with the number of entities involved.
We propose SVA$k_textADD$, a novel approximation method that fits a $k$-additive surrogate game.
arXiv Detail & Related papers (2025-02-07T08:52:57Z) - Statistical Significance of Feature Importance Rankings [3.8642937395065124]
We devise techniques that ensure the most important features are correct with high-probability guarantees.
These assess the set of $K$ top-ranked features, as well as the order of its elements.
We then introduce two efficient sampling algorithms that identify the $K$ most important features, perhaps in order, with probability exceeding $1-alpha$.
arXiv Detail & Related papers (2024-01-28T23:14:51Z) - Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques.
In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets.
Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z) - DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation [23.646508094051768]
We consider the dataset valuation problem, that is, the problem of quantifying the incremental gain.
The Shapley value is a natural tool to perform dataset valuation due to its formal axiomatic justification.
We propose a novel approximation, referred to as discrete uniform Shapley, which is expressed as an expectation under a discrete uniform distribution.
arXiv Detail & Related papers (2023-06-03T10:22:50Z) - Approximating the Shapley Value without Marginal Contributions [11.539320505465149]
The Shapley value, which is arguably the most popular approach for assigning a meaningful contribution value to players in a cooperative game, has recently been used intensively in explainable artificial intelligence.
We propose two parameter-free and domain-independent approximation algorithms based on a representation of the Shapley value detached from the notion of marginal contribution.
arXiv Detail & Related papers (2023-02-01T20:14:22Z) - Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline.
$nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z) - SHAP-XRT: The Shapley Value Meets Conditional Independence Testing [21.794110108580746]
We show that Shapley-based explanation methods and conditional independence testing are closely related.
We introduce the SHAPley EXplanation Randomization Test (SHAP-XRT), a testing procedure inspired by the Conditional Randomization Test (CRT) for a specific notion of local (i.e., on a sample) conditional independence.
We show that the Shapley value itself provides an upper bound to the expected $p$-value of a global (i.e., overall) null hypothesis.
arXiv Detail & Related papers (2022-07-14T16:28:54Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - $\gamma$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a
Robust Divergence Estimator [95.71091446753414]
We propose to use a nearest-neighbor-based $gamma$-divergence estimator as a data discrepancy measure.
Our method achieves significantly higher robustness than existing discrepancy measures.
arXiv Detail & Related papers (2020-06-13T06:09:27Z) - The Price of Incentivizing Exploration: A Characterization via Thompson
Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents.
We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.