Related papers: Sufficient and Necessary Explanations (and What Lies in Between)

Sufficient and Necessary Explanations (and What Lies in Between)

URL: http://arxiv.org/abs/2409.20427v2
Date: Tue, 15 Oct 2024 14:04:35 GMT
Title: Sufficient and Necessary Explanations (and What Lies in Between)
Authors: Beepul Bharti, Paul Yi, Jeremias Sulam,
Abstract summary: We study two precise notions of feature importance for general machine learning models: sufficiency and necessity. We propose a unified notion of importance that circumvents these limitations by exploring a continuum along a necessity-sufficiency axis.
Score: 6.9035001722324685
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As complex machine learning models continue to find applications in high-stakes decision-making scenarios, it is crucial that we can explain and understand their predictions. Post-hoc explanation methods provide useful insights by identifying important features in an input $\mathbf{x}$ with respect to the model output $f(\mathbf{x})$. In this work, we formalize and study two precise notions of feature importance for general machine learning models: sufficiency and necessity. We demonstrate how these two types of explanations, albeit intuitive and simple, can fall short in providing a complete picture of which features a model finds important. To this end, we propose a unified notion of importance that circumvents these limitations by exploring a continuum along a necessity-sufficiency axis. Our unified notion, we show, has strong ties to other popular definitions of feature importance, like those based on conditional independence and game-theoretic quantities like Shapley values. Crucially, we demonstrate how a unified perspective allows us to detect important features that could be missed by either of the previous approaches alone.

Related papers

Hard to Explain: On the Computational Hardness of In-Distribution Model Interpretation [0.9558392439655016]
The ability to interpret Machine Learning (ML) models is becoming increasingly essential. Recent work has demonstrated that it is possible to formally assess interpretability by studying the computational complexity of explaining the decisions of various models.
arXiv Detail & Related papers (2024-08-07T17:20:52Z)
Explaining the Model and Feature Dependencies by Decomposition of the Shapley Value [3.0655581300025996]
Shapley values have become one of the go-to methods to explain complex models to end-users. One downside is that they always require outputs of the model when some features are missing. This however introduces a non-trivial choice: do we condition on the unknown features or not? We propose a new algorithmic approach to combine both explanations, removing the burden of choice and enhancing the explanatory power of Shapley values.
arXiv Detail & Related papers (2023-06-19T12:20:23Z)
Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z)
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small [68.879023473838]
We present an explanation for how GPT-2 small performs a natural language task called indirect object identification (IOI) To our knowledge, this investigation is the largest end-to-end attempt at reverse-engineering a natural behavior "in the wild" in a language model.
arXiv Detail & Related papers (2022-11-01T17:08:44Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Fundamental Limits and Tradeoffs in Invariant Representation Learning [99.2368462915979]
Many machine learning applications involve learning representations that achieve two competing goals. Minimax game-theoretic formulation represents a fundamental tradeoff between accuracy and invariance. We provide an information-theoretic analysis of this general and important problem under both classification and regression settings.
arXiv Detail & Related papers (2020-12-19T15:24:04Z)
Bayesian Importance of Features (BIF) [11.312036995195594]
We use the Dirichlet distribution to define the importance of input features and learn it via approximate Bayesian inference. The learned importance has probabilistic interpretation and provides the relative significance of each input feature to a model's output. We show the effectiveness of our method on a variety of synthetic and real datasets.
arXiv Detail & Related papers (2020-10-26T19:55:58Z)
The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models. We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z)
Problems with Shapley-value-based explanations as feature importance measures [12.08945475767566]
Game-theoretic formulations of feature importance have become popular as a way to "explain" machine learning models. We show that mathematical problems arise when Shapley values are used for feature importance. We argue that Shapley values do not provide explanations which suit human-centric goals of explainability.
arXiv Detail & Related papers (2020-02-25T18:51:14Z)
Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm. Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function. We develop an approach for representation learning in RL that sits in between these two extremes. This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.