Efficient SAGE Estimation via Causal Structure Learning
- URL: http://arxiv.org/abs/2304.03113v1
- Date: Thu, 6 Apr 2023 14:40:32 GMT
- Title: Efficient SAGE Estimation via Causal Structure Learning
- Authors: Christoph Luther, Gunnar K\"onig, Moritz Grosse-Wentrup
- Abstract summary: $d$-SAGE is a method that accelerates approximation to the Shapley Additive Global Importance (SAGE) value.
$d$-SAGE is motivated by the observation that conditional independencies between a feature and the model target imply zero surplus contributions.
We demonstrate that $d$-SAGE enables the efficient and accurate estimation of SAGE values.
- Score: 0.7243632426715939
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Shapley Additive Global Importance (SAGE) value is a theoretically
appealing interpretability method that fairly attributes global importance to a
model's features. However, its exact calculation requires the computation of
the feature's surplus performance contributions over an exponential number of
feature sets. This is computationally expensive, particularly because
estimating the surplus contributions requires sampling from conditional
distributions. Thus, SAGE approximation algorithms only take a fraction of the
feature sets into account. We propose $d$-SAGE, a method that accelerates SAGE
approximation. $d$-SAGE is motivated by the observation that conditional
independencies (CIs) between a feature and the model target imply zero surplus
contributions, such that their computation can be skipped. To identify CIs, we
leverage causal structure learning (CSL) to infer a graph that encodes
(conditional) independencies in the data as $d$-separations. This is
computationally more efficient because the expense of the one-time graph
inference and the $d$-separation queries is negligible compared to the expense
of surplus contribution evaluations. Empirically we demonstrate that $d$-SAGE
enables the efficient and accurate estimation of SAGE values.
Related papers
- A hierarchical decomposition for explaining ML performance discrepancies [6.603088808962966]
Machine learning algorithms can often differ in performance across domains. Understanding $textitwhy$ their performance differs is crucial for determining what types of interventions are most effective at closing the performance gaps.
We introduce a nonparametric hierarchical framework that provides both aggregate and detailed decompositions for explaining why the performance of an ML algorithm differs across domains without requiring causal knowledge.
arXiv Detail & Related papers (2024-02-22T03:41:05Z) - Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric
Learning [1.4293924404819704]
We shed new light on the traditional nearest neighbors algorithm from the perspective of information theory.
We propose a robust and interpretable framework for tasks such as classification, regression, density estimation, and anomaly detection using a single model.
Our work showcases the architecture's versatility by achieving state-of-the-art results in classification and anomaly detection.
arXiv Detail & Related papers (2023-11-17T00:35:38Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation [23.646508094051768]
We consider the dataset valuation problem, that is, the problem of quantifying the incremental gain.
The Shapley value is a natural tool to perform dataset valuation due to its formal axiomatic justification.
We propose a novel approximation, referred to as discrete uniform Shapley, which is expressed as an expectation under a discrete uniform distribution.
arXiv Detail & Related papers (2023-06-03T10:22:50Z) - A $k$-additive Choquet integral-based approach to approximate the SHAP
values for local interpretability in machine learning [8.637110868126546]
This paper aims at providing some interpretability for machine learning models based on Shapley values.
A SHAP-based method called Kernel SHAP adopts an efficient strategy that approximates such values with less computational effort.
The obtained results attest that our proposal needs less computations on coalitions of attributes to approximate the SHAP values.
arXiv Detail & Related papers (2022-11-03T22:34:50Z) - Efficient Non-Local Contrastive Attention for Image Super-Resolution [48.093500219958834]
Non-Local Attention (NLA) brings significant improvement for Single Image Super-Resolution (SISR) by leveraging intrinsic feature correlation in natural images.
We propose a novel Efficient Non-Local Contrastive Attention (ENLCA) to perform long-range visual modeling and leverage more relevant non-local features.
arXiv Detail & Related papers (2022-01-11T05:59:09Z) - Multi-task Learning of Order-Consistent Causal Graphs [59.9575145128345]
We consider the problem of discovering $K related Gaussian acyclic graphs (DAGs)
Under multi-task learning setting, we propose a $l_1/l$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models.
We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order.
arXiv Detail & Related papers (2021-11-03T22:10:18Z) - Reinforcement Learning with General Value Function Approximation:
Provably Efficient Approach via Bounded Eluder Dimension [124.7752517531109]
We establish a provably efficient reinforcement learning algorithm with general value function approximation.
We show that our algorithm achieves a regret bound of $widetildeO(mathrmpoly(dH)sqrtT)$ where $d$ is a complexity measure.
Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment.
arXiv Detail & Related papers (2020-05-21T17:36:09Z) - An Information Bottleneck Approach for Controlling Conciseness in
Rationale Extraction [84.49035467829819]
We show that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective.
Our fully unsupervised approach jointly learns an explainer that predicts sparse binary masks over sentences, and an end-task predictor that considers only the extracted rationale.
arXiv Detail & Related papers (2020-05-01T23:26:41Z) - Interpretable feature subset selection: A Shapley value based approach [1.511944009967492]
We introduce the notion of classification game, a cooperative game with features as players and hinge loss based characteristic function.
Our major contribution is ($star$) to show that for any dataset the threshold 0 on SVEA value identifies feature subset whose joint interactions for label prediction is significant.
arXiv Detail & Related papers (2020-01-12T16:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.