shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python
- URL: http://arxiv.org/abs/2504.01842v1
- Date: Wed, 02 Apr 2025 15:47:30 GMT
- Title: shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python
- Authors: Martin Jullum, Lars Henry Berge Olsen, Jon Lachmann, Annabelle Redelmeier,
- Abstract summary: shapr is a versatile tool for generating Shapley value explanations for machine learning and statistical regression models in both R and Python.<n>We introduce the shaprpy Python library, which brings core capabilities of shapr to the Python ecosystem.
- Score: 0.6562256987706128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces the shapr package, a versatile tool for generating Shapley value explanations for machine learning and statistical regression models in both R and Python. The package emphasizes conditional Shapley value estimates, providing a comprehensive range of approaches for accurately capturing feature dependencies, which is crucial for correct model interpretation and lacking in similar software. In addition to regular tabular data, the shapr R-package includes specialized functionality for explaining time series forecasts. The package offers a minimal set of user functions with sensible defaults for most use cases while providing extensive flexibility for advanced users to fine-tune computations. Additional features include parallelized computations, iterative estimation with convergence detection, and rich visualization tools. shapr also extends its functionality to compute causal and asymmetric Shapley values when causal information is available. In addition, we introduce the shaprpy Python library, which brings core capabilities of shapr to the Python ecosystem. Overall, the package aims to enhance the interpretability of predictive models within a powerful and user-friendly framework.
Related papers
- Functional relevance based on the continuous Shapley value [0.0]
This work focuses on interpretability of predictive models based on functional data.<n>We propose an interpretability method based on the Shapley value for continuous games.<n>The method is illustrated through a set of experiments with simulated and real data sets.
arXiv Detail & Related papers (2024-11-27T18:20:00Z) - RobPy: a Python Package for Robust Statistical Methods [1.2233362977312945]
RobPy offers a wide range of robust methods in Python, built upon established libraries including NumPy, SciPy, and scikit-learn.
This paper presents the structure of the RobPy package, demonstrates its functionality through examples, and compares its features to existing implementations in other statistical software.
arXiv Detail & Related papers (2024-11-04T10:27:30Z) - Improving the Sampling Strategy in KernelSHAP [0.8057006406834466]
KernelSHAP framework enables us to approximate the Shapley values using a sampled subset of weighted conditional expectations.
We propose three main novel contributions: a stabilizing technique to reduce the variance of the weights in the current state-of-the-art strategy, a novel weighing scheme that corrects the Shapley kernel weights based on sampled subsets, and a straightforward strategy that includes the important subsets and integrates them with the corrected Shapley kernel weights.
arXiv Detail & Related papers (2024-10-07T10:02:31Z) - Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for Explaining Language Model Predictions [38.87540833773233]
We propose a notion of robustness on the sign of the instance score.
We introduce an efficient fine-tuning-free approximation of the Shapley value for instance attribution.
arXiv Detail & Related papers (2024-06-07T03:29:57Z) - Interpreting Deep Neural Networks with the Package innsight [0.951828574518325]
innsight is generally the first R package implementing feature attribution methods for neural networks.
It operates independently of the deep learning library allowing the interpretation of models from any R package.
Innsight benefits internally from the torch package's fast and efficient array calculations.
arXiv Detail & Related papers (2023-06-19T10:12:32Z) - Efficient Shapley Values Estimation by Amortization for Text
Classification [66.7725354593271]
We develop an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations.
Experimental results on two text classification datasets demonstrate that our amortized model estimates Shapley Values accurately with up to 60 times speedup.
arXiv Detail & Related papers (2023-05-31T16:19:13Z) - Learning Summary Statistics for Bayesian Inference with Autoencoders [58.720142291102135]
We use the inner dimension of deep neural network based Autoencoders as summary statistics.
To create an incentive for the encoder to encode all the parameter-related information but not the noise, we give the decoder access to explicit or implicit information that has been used to generate the training data.
arXiv Detail & Related papers (2022-01-28T12:00:31Z) - Scikit-dimension: a Python package for intrinsic dimension estimation [58.8599521537]
This technical note introduces textttscikit-dimension, an open-source Python package for intrinsic dimension estimation.
textttscikit-dimension package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface.
We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data.
arXiv Detail & Related papers (2021-09-06T16:46:38Z) - Particle-Gibbs Sampling For Bayesian Feature Allocation Models [77.57285768500225]
Most widely used MCMC strategies rely on an element wise Gibbs update of the feature allocation matrix.
We have developed a Gibbs sampler that can update an entire row of the feature allocation matrix in a single move.
This sampler is impractical for models with a large number of features as the computational complexity scales exponentially in the number of features.
We develop a Particle Gibbs sampler that targets the same distribution as the row wise Gibbs updates, but has computational complexity that only grows linearly in the number of features.
arXiv Detail & Related papers (2020-01-25T22:11:51Z) - Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.