shapiq: Shapley Interactions for Machine Learning
- URL: http://arxiv.org/abs/2410.01649v1
- Date: Wed, 2 Oct 2024 15:16:53 GMT
- Title: shapiq: Shapley Interactions for Machine Learning
- Authors: Maximilian Muschalik, Hubert Baniecki, Fabian Fumagalli, Patrick Kolpaczki, Barbara Hammer, Eyke Hüllermeier,
- Abstract summary: We introduce shapiq, an open-source Python package that unifies state-of-the-art algorithms to efficiently compute Shapley Value (SV) and Shapley Interactions (SIs)
For practitioners, shapiq is able to explain and visualize any-order feature interactions in predictions of models, including vision transformers, language models, as well as XGBoost and LightGBM with TreeShap-IQ.
- Score: 21.939393765684827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Originally rooted in game theory, the Shapley Value (SV) has recently become an important tool in machine learning research. Perhaps most notably, it is used for feature attribution and data valuation in explainable artificial intelligence. Shapley Interactions (SIs) naturally extend the SV and address its limitations by assigning joint contributions to groups of entities, which enhance understanding of black box machine learning models. Due to the exponential complexity of computing SVs and SIs, various methods have been proposed that exploit structural assumptions or yield probabilistic estimates given limited resources. In this work, we introduce shapiq, an open-source Python package that unifies state-of-the-art algorithms to efficiently compute SVs and any-order SIs in an application-agnostic framework. Moreover, it includes a benchmarking suite containing 11 machine learning applications of SIs with pre-computed games and ground-truth values to systematically assess computational performance across domains. For practitioners, shapiq is able to explain and visualize any-order feature interactions in predictions of models, including vision transformers, language models, as well as XGBoost and LightGBM with TreeSHAP-IQ. With shapiq, we extend shap beyond feature attributions and consolidate the application of SVs and SIs in machine learning that facilitates future research. The source code and documentation are available at https://github.com/mmschlk/shapiq.
Related papers
- Exact Computation of Any-Order Shapley Interactions for Graph Neural Networks [53.10674067060148]
Shapley Interactions (SIs) quantify node contributions and interactions among multiple nodes.
By exploiting the GNN architecture, we show that the structure of interactions in node embeddings are preserved for graph prediction.
We introduce GraphSHAP-IQ, an efficient approach to compute any-order SIs exactly.
arXiv Detail & Related papers (2025-01-28T13:37:44Z) - Medical artificial intelligence toolbox (MAIT): an explainable machine learning framework for binary classification, survival modelling, and regression analyses [0.0]
Medical Artificial Intelligence Toolbox (MAIT) is an explainable, open-source Python pipeline for developing and evaluating binary classification, regression, and survival models.
MAIT addresses key challenges (e.g., high dimensionality, class imbalance, mixed variable types, and missingness) while promoting transparency in reporting.
We provide detailed tutorials on GitHub, using four open-access data sets, to demonstrate how MAIT can be used to improve implementation and interpretation of ML models in medical research.
arXiv Detail & Related papers (2025-01-08T14:51:36Z) - NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals [58.83169560132308]
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of very large neural networks.
NNsight is an open-source system that extends PyTorch to introduce deferred remote execution.
NDIF is a scalable inference service that executes NNsight requests, allowing users to share GPU resources and pretrained models.
arXiv Detail & Related papers (2024-07-18T17:59:01Z) - Coupling Machine Learning with Ontology for Robotics Applications [0.0]
The lack of availability of prior knowledge in dynamic scenarios is without doubt a major barrier for scalable machine intelligence.
My view of the interaction between the two tiers intelligence is based on the idea that when knowledge is not readily available at the knowledge base tier, more knowledge can be extracted from the other tier.
arXiv Detail & Related papers (2024-06-08T23:38:03Z) - Zero-knowledge Proof Meets Machine Learning in Verifiability: A Survey [19.70499936572449]
High-quality models rely not only on efficient optimization algorithms but also on the training and learning processes built upon vast amounts of data and computational power.
Due to various challenges such as limited computational resources and data privacy concerns, users in need of models often cannot train machine learning models locally.
This paper presents a comprehensive survey of zero-knowledge proof-based verifiable machine learning (ZKP-VML) technology.
arXiv Detail & Related papers (2023-10-23T12:15:23Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT)
All tasks in KILT are grounded in the same snapshot of Wikipedia.
We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z) - A Theory of Usable Information Under Computational Constraints [103.5901638681034]
We propose a new framework for reasoning about information in complex systems.
Our foundation is based on a variational extension of Shannon's information theory.
We show that by incorporating computational constraints, $mathcalV$-information can be reliably estimated from data.
arXiv Detail & Related papers (2020-02-25T06:09:30Z) - PHOTONAI -- A Python API for Rapid Machine Learning Model Development [2.414341608751139]
PHOTONAI is a high-level Python API designed to simplify and accelerate machine learning model development.
It functions as a unifying framework allowing the user to easily access and combine algorithms from different toolboxes into custom algorithm sequences.
arXiv Detail & Related papers (2020-02-13T10:33:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.