Related papers: shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python

shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python

URL: http://arxiv.org/abs/2504.01842v1
Date: Wed, 02 Apr 2025 15:47:30 GMT
Title: shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python
Authors: Martin Jullum, Lars Henry Berge Olsen, Jon Lachmann, Annabelle Redelmeier,
Abstract summary: shapr is a versatile tool for generating Shapley value explanations for machine learning and statistical regression models in both R and Python.<n>We introduce the shaprpy Python library, which brings core capabilities of shapr to the Python ecosystem.
Score: 0.6562256987706128
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces the shapr package, a versatile tool for generating Shapley value explanations for machine learning and statistical regression models in both R and Python. The package emphasizes conditional Shapley value estimates, providing a comprehensive range of approaches for accurately capturing feature dependencies, which is crucial for correct model interpretation and lacking in similar software. In addition to regular tabular data, the shapr R-package includes specialized functionality for explaining time series forecasts. The package offers a minimal set of user functions with sensible defaults for most use cases while providing extensive flexibility for advanced users to fine-tune computations. Additional features include parallelized computations, iterative estimation with convergence detection, and rich visualization tools. shapr also extends its functionality to compute causal and asymmetric Shapley values when causal information is available. In addition, we introduce the shaprpy Python library, which brings core capabilities of shapr to the Python ecosystem. Overall, the package aims to enhance the interpretability of predictive models within a powerful and user-friendly framework.

Related papers

DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets.<n>Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining.<n>Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z)
Functional relevance based on the continuous Shapley value [0.0]
This work focuses on interpretability of predictive models based on functional data.<n>We propose an interpretability method based on the Shapley value for continuous games.<n>The method is illustrated through a set of experiments with simulated and real data sets.
arXiv Detail & Related papers (2024-11-27T18:20:00Z)
RobPy: a Python Package for Robust Statistical Methods [1.2233362977312945]
RobPy offers a wide range of robust methods in Python, built upon established libraries including NumPy, SciPy, and scikit-learn. This paper presents the structure of the RobPy package, demonstrates its functionality through examples, and compares its features to existing implementations in other statistical software.
arXiv Detail & Related papers (2024-11-04T10:27:30Z)
Improving the Sampling Strategy in KernelSHAP [0.8057006406834466]
KernelSHAP framework enables us to approximate the Shapley values using a sampled subset of weighted conditional expectations. We propose three main novel contributions: a stabilizing technique to reduce the variance of the weights in the current state-of-the-art strategy, a novel weighing scheme that corrects the Shapley kernel weights based on sampled subsets, and a straightforward strategy that includes the important subsets and integrates them with the corrected Shapley kernel weights.
arXiv Detail & Related papers (2024-10-07T10:02:31Z)
Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives. Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion. We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z)
BONES: a Benchmark fOr Neural Estimation of Shapley values [7.243632426715939]
We present BONES, a new benchmark focused on neural estimation of Shapley Value. BONES provides researchers with a suite of state-of-the-art neural and traditional estimators. The purpose is to simplify XAI model usage, evaluation, and comparison.
arXiv Detail & Related papers (2024-07-23T13:53:22Z)
Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for Explaining Language Model Predictions [38.87540833773233]
We propose a notion of robustness on the sign of the instance score. We introduce an efficient fine-tuning-free approximation of the Shapley value for instance attribution.
arXiv Detail & Related papers (2024-06-07T03:29:57Z)
Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer. The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z)
Interpreting Deep Neural Networks with the Package innsight [0.951828574518325]
innsight is generally the first R package implementing feature attribution methods for neural networks. It operates independently of the deep learning library allowing the interpretation of models from any R package. Innsight benefits internally from the torch package's fast and efficient array calculations.
arXiv Detail & Related papers (2023-06-19T10:12:32Z)
Efficient Shapley Values Estimation by Amortization for Text Classification [66.7725354593271]
We develop an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations. Experimental results on two text classification datasets demonstrate that our amortized model estimates Shapley Values accurately with up to 60 times speedup.
arXiv Detail & Related papers (2023-05-31T16:19:13Z)
Learning Summary Statistics for Bayesian Inference with Autoencoders [58.720142291102135]
We use the inner dimension of deep neural network based Autoencoders as summary statistics. To create an incentive for the encoder to encode all the parameter-related information but not the noise, we give the decoder access to explicit or implicit information that has been used to generate the training data.
arXiv Detail & Related papers (2022-01-28T12:00:31Z)
Using Shapley Values and Variational Autoencoders to Explain Predictive Models with Dependent Mixed Features [2.064612766965483]
We use a variational autoencoder with arbitrary conditioning (VAEAC) to model all feature dependencies simultaneously. We apply VAEAC to the Abalone data set from the UCI Machine Learning Repository.
arXiv Detail & Related papers (2021-11-26T14:05:45Z)
Scikit-dimension: a Python package for intrinsic dimension estimation [58.8599521537]
This technical note introduces textttscikit-dimension, an open-source Python package for intrinsic dimension estimation. textttscikit-dimension package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data.
arXiv Detail & Related papers (2021-09-06T16:46:38Z)
Flexible Model Aggregation for Quantile Regression [92.63075261170302]
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions. We investigate methods for aggregating any number of conditional quantile models. All of the models we consider in this paper can be fit using modern deep learning toolkits.
arXiv Detail & Related papers (2021-02-26T23:21:16Z)
Particle-Gibbs Sampling For Bayesian Feature Allocation Models [77.57285768500225]
Most widely used MCMC strategies rely on an element wise Gibbs update of the feature allocation matrix. We have developed a Gibbs sampler that can update an entire row of the feature allocation matrix in a single move. This sampler is impractical for models with a large number of features as the computational complexity scales exponentially in the number of features. We develop a Particle Gibbs sampler that targets the same distribution as the row wise Gibbs updates, but has computational complexity that only grows linearly in the number of features.
arXiv Detail & Related papers (2020-01-25T22:11:51Z)
Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach. IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language. We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.