Statistical Inference and Learning for Shapley Additive Explanations (SHAP)
- URL: http://arxiv.org/abs/2602.10532v1
- Date: Wed, 11 Feb 2026 05:01:47 GMT
- Title: Statistical Inference and Learning for Shapley Additive Explanations (SHAP)
- Authors: Justin Whitehouse, Ayush Sawarni, Vasilis Syrgkanis,
- Abstract summary: The SHAP (short for Shapley additive explanation) framework has become an essential tool for attributing importance to variables in predictive tasks.<n>Despite their ubiquity, there do not exist approaches for performing statistical inference on these quantities.<n>We show that, by treating the SHAP curve as a nuisance function that must be estimated from data, one can reliably constructally normal estimates of the $p$th powers of SHAP.
- Score: 20.663970002208846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The SHAP (short for Shapley additive explanation) framework has become an essential tool for attributing importance to variables in predictive tasks. In model-agnostic settings, SHAP uses the concept of Shapley values from cooperative game theory to fairly allocate credit to the features in a vector $X$ based on their contribution to an outcome $Y$. While the explanations offered by SHAP are local by nature, learners often need global measures of feature importance in order to improve model explainability and perform feature selection. The most common approach for converting these local explanations into global ones is to compute either the mean absolute SHAP or mean squared SHAP. However, despite their ubiquity, there do not exist approaches for performing statistical inference on these quantities. In this paper, we take a semi-parametric approach for calibrating confidence in estimates of the $p$th powers of Shapley additive explanations. We show that, by treating the SHAP curve as a nuisance function that must be estimated from data, one can reliably construct asymptotically normal estimates of the $p$th powers of SHAP. When $p \geq 2$, we show a de-biased estimator that combines U-statistics with Neyman orthogonal scores for functionals of nested regressions is asymptotically normal. When $1 \leq p < 2$ (and the hence target parameter is not twice differentiable), we construct de-biased U-statistics for a smoothed alternative. In particular, we show how to carefully tune the temperature parameter of the smoothing function in order to obtain inference for the true, unsmoothed $p$th power. We complement these results by presenting a Neyman orthogonal loss that can be used to learn the SHAP curve via empirical risk minimization and discussing excess risk guarantees for commonly used function classes.
Related papers
- Statistical Inference for Explainable Boosting Machines [5.01181440341076]
Explainable boosting machines (EBMs) are popular "glass-box" models with visualizations of each feature's effect.<n>We provide an alternative using recent advances in statistical inference for gradient boosting, deriving methods for statistical inference as well as end-to-end theoretical guarantees.
arXiv Detail & Related papers (2026-01-26T17:51:09Z) - Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.<n>The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.<n>The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z) - Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline.
$nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z) - On the Identifiability and Estimation of Causal Location-Scale Noise
Models [122.65417012597754]
We study the class of location-scale or heteroscedastic noise models (LSNMs)
We show the causal direction is identifiable up to some pathological cases.
We propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks.
arXiv Detail & Related papers (2022-10-13T17:18:59Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - Consistent Sufficient Explanations and Minimal Local Rules for
explaining regression and classification models [0.0]
We extend the notion of probabilistic Sufficient Explanations (P-SE)
The crux of P-SE is to compute the conditional probability of maintaining the same prediction.
We deal with non-binary features, without learning the distribution of $X$ nor having the model for making predictions.
arXiv Detail & Related papers (2021-11-08T17:27:52Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Unbiased Estimation Equation under $f$-Separable Bregman Distortion
Measures [0.3553493344868413]
We discuss unbiased estimation equations in a class of objective function using a monotonically increasing function $f$ and Bregman divergence.
The choice of the function $f$ gives desirable properties such as robustness against outliers.
In this study, we clarify the combination of Bregman divergence, statistical model, and function $f$ in which the bias correction term vanishes.
arXiv Detail & Related papers (2020-10-23T10:33:55Z) - Estimation in Tensor Ising Models [5.161531917413708]
We consider the problem of estimating the natural parameter of the $p$-tensor Ising model given a single sample from the distribution on $N$ nodes.
In particular, we show the $sqrt N$-consistency of the MPL estimate in the $p$-spin Sherrington-Kirkpatrick (SK) model.
We derive the precise fluctuations of the MPL estimate in the special case of the $p$-tensor Curie-Weiss model.
arXiv Detail & Related papers (2020-08-29T00:06:58Z) - Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data.
We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.