Related papers: llmSHAP: A Principled Approach to LLM Explainability

llmSHAP: A Principled Approach to LLM Explainability

URL: http://arxiv.org/abs/2511.01311v1
Date: Mon, 03 Nov 2025 07:54:47 GMT
Title: llmSHAP: A Principled Approach to LLM Explainability
Authors: Filip Naudot, Tobias Sundqvist, Timotheus Kampik,
Abstract summary: Feature attribution methods make machine learning-based inference explainable by determining how much one or several features have contributed to a model's output.<n>A particularly popular attribution method is based on the Shapley value from cooperative game theory, a measure that guarantees the satisfaction of several desirable principles.<n>We apply the Shapley value to feature attribution in large language model (LLM)-based decision support systems, where inference is, by design, (non-deterministic)
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Feature attribution methods help make machine learning-based inference explainable by determining how much one or several features have contributed to a model's output. A particularly popular attribution method is based on the Shapley value from cooperative game theory, a measure that guarantees the satisfaction of several desirable principles, assuming deterministic inference. We apply the Shapley value to feature attribution in large language model (LLM)-based decision support systems, where inference is, by design, stochastic (non-deterministic). We then demonstrate when we can and cannot guarantee Shapley value principle satisfaction across different implementation variants applied to LLM-based decision support, and analyze how the stochastic nature of LLMs affects these guarantees. We also highlight trade-offs between explainable inference speed, agreement with exact Shapley value attributions, and principle attainment.

Related papers

Concept Component Analysis: A Principled Approach for Concept Extraction in LLMs [51.378834857406325]
Mechanistic interpretability seeks to mitigate the issues through extracts from large language models.<n>Sparse autoencoders (SAEs) have emerged as a popular approach for extracting interpretable and monosemantic concepts.<n>We show that SAEs suffer from a fundamental theoretical ambiguity: the well-defined correspondence between LLM representations and human-interpretable concepts remains unclear.
arXiv Detail & Related papers (2026-01-28T09:27:05Z)
Latent Chain-of-Thought for Visual Reasoning [53.541579327424046]
Chain-of-thought (CoT) reasoning is critical for improving the interpretability and reliability of Large Vision-Language Models (LVLMs)<n>We reformulate reasoning in LVLMs as posterior inference and propose a scalable training algorithm based on amortized variational inference.<n>We empirically demonstrate that the proposed method enhances the state-of-the-art LVLMs on seven reasoning benchmarks.
arXiv Detail & Related papers (2025-10-27T23:10:06Z)
Fair Document Valuation in LLM Summaries via Shapley Values [0.0]
Large Language Models (LLMs) are increasingly used in systems that retrieve and summarize content from multiple sources.<n>These systems obscure the individual contributions of original content creators, raising concerns about credit attribution and compensation.<n>We propose a Shapley value-based framework for fair document valuation.
arXiv Detail & Related papers (2025-05-28T15:14:21Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.<n>The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.<n>The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
Improving the Weighting Strategy in KernelSHAP [0.8057006406834466]
In Explainable AI (XAI) Shapley values are a popular framework for explaining predictions made by complex machine learning models.<n>We propose a novel modification of KernelSHAP which replaces the deterministic weights with ones to reduce the variance of the resulting Shapley value approximations.<n>Our methods can reduce the required number of contribution function evaluations by $5%$ to $50%$ while preserving the same accuracy of the approximated Shapley values.
arXiv Detail & Related papers (2024-10-07T10:02:31Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Probabilistic Shapley Value Modeling and Inference [4.025747321359554]
Probability Shapley inference (PSI) is a novel framework to model and infer sufficient statistics of feature attributions in flexible predictive models.<n>We introduce a masking-based neural network architecture, with a modular training and inference procedure.<n>We evaluate PSI on synthetic and real-world datasets, showing that it achieves competitive predictive performance compared to strong baselines.
arXiv Detail & Related papers (2024-02-06T18:09:05Z)
Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences. Our method is especially suitable for problems with well-specified likelihoods. We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z)
Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks. We study the problem from a model-based Bayesian reinforcement learning perspective. We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z)
Exact Shapley Values for Local and Model-True Explanations of Decision Tree Ensembles [0.0]
We consider the application of Shapley values for explaining decision tree ensembles. We present a novel approach to Shapley value-based feature attribution that can be applied to random forests and boosted decision trees.
arXiv Detail & Related papers (2021-12-16T20:16:02Z)
RKHS-SHAP: Shapley Values for Kernel Methods [17.52161019964009]
We propose an attribution method for kernel machines that can efficiently compute both emphInterventional and emphObservational Shapley values We show theoretically that our method is robust with respect to local perturbations - a key yet often overlooked desideratum for interpretability.
arXiv Detail & Related papers (2021-10-18T10:35:36Z)
Joint Shapley values: a measure of joint feature importance [6.169364905804678]
We introduce joint Shapley values, which directly extend the Shapley axioms. Joint Shapley values measure a set of features' average effect on a model's prediction. Results for games show that joint Shapley values present different insights from existing interaction indices.
arXiv Detail & Related papers (2021-07-23T17:22:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.