Related papers: Explaining Probabilistic Models with Distributional Values

Explaining Probabilistic Models with Distributional Values

URL: http://arxiv.org/abs/2402.09947v3
Date: Fri, 25 Oct 2024 10:53:54 GMT
Title: Explaining Probabilistic Models with Distributional Values
Authors: Luca Franceschi, Michele Donini, Cédric Archambeau, Matthias Seeger,
Abstract summary: Research indicates that game-theoretic explanations may mislead or be hard to interpret. We argue that often there is a critical mismatch between what one wishes to explain and what current methods such as SHAP explain. This paper addresses such gap for probabilistic models by generalising cooperative games and value operators.
Score: 12.26389108393613
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A large branch of explainable machine learning is grounded in cooperative game theory. However, research indicates that game-theoretic explanations may mislead or be hard to interpret. We argue that often there is a critical mismatch between what one wishes to explain (e.g. the output of a classifier) and what current methods such as SHAP explain (e.g. the scalar probability of a class). This paper addresses such gap for probabilistic models by generalising cooperative games and value operators. We introduce the distributional values, random variables that track changes in the model output (e.g. flipping of the predicted class) and derive their analytic expressions for games with Gaussian, Bernoulli and Categorical payoffs. We further establish several characterising properties, and show that our framework provides fine-grained and insightful explanations with case studies on vision and language models.

Related papers

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z)
Hard to Explain: On the Computational Hardness of In-Distribution Model Interpretation [0.9558392439655016]
The ability to interpret Machine Learning (ML) models is becoming increasingly essential. Recent work has demonstrated that it is possible to formally assess interpretability by studying the computational complexity of explaining the decisions of various models.
arXiv Detail & Related papers (2024-08-07T17:20:52Z)
Inference via Interpolation: Contrastive Representations Provably Enable Planning and Inference [110.47649327040392]
Given time series data, how can we answer questions like "what will happen in the future?" and "how did we get here?" We show how these questions can have compact, closed form solutions in terms of learned representations.
arXiv Detail & Related papers (2024-03-06T22:27:30Z)
Understanding Post-hoc Explainers: The Case of Anchors [6.681943980068051]
We present a theoretical analysis of a rule-based interpretability method that highlights a small set of words to explain a text's decision. After formalizing its algorithm and providing useful insights, we demonstrate mathematically that Anchors produces meaningful results.
arXiv Detail & Related papers (2023-03-15T17:56:34Z)
Visual Abductive Reasoning [85.17040703205608]
Abductive reasoning seeks the likeliest possible explanation for partial observations. We propose a new task and dataset, Visual Abductive Reasoning ( VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations.
arXiv Detail & Related papers (2022-03-26T10:17:03Z)
Do not explain without context: addressing the blind spot of model explanations [2.280298858971133]
This paper highlights a blind spot which is often overlooked when monitoring and auditing machine learning models. We discuss that many model explanations depend directly or indirectly on the choice of the referenced data distribution. We showcase examples where small changes in the distribution lead to drastic changes in the explanations, such as a change in trend or, alarmingly, a conclusion.
arXiv Detail & Related papers (2021-05-28T12:48:40Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Why do classifier accuracies show linear trends under distribution shift? [58.40438263312526]
accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution. We assume the probability that two models agree in their predictions is higher than what we can infer from their accuracy levels alone. We show that a linear trend must occur when evaluating models on two distributions unless the size of the distribution shift is large.
arXiv Detail & Related papers (2020-12-31T07:24:30Z)
Dependency Decomposition and a Reject Option for Explainable Models [4.94950858749529]
Recent deep learning models perform extremely well in various inference tasks. Recent advances offer methods to visualize features, describe attribution of the input. We present the first analysis of dependencies regarding the probability distribution over the desired image classification outputs.
arXiv Detail & Related papers (2020-12-11T17:39:33Z)
The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models. We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z)
Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented. It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts. Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.