Related papers: Model-agnostic Feature Importance and Effects with Dependent Features -- A Conditional Subgroup Approach

Model-agnostic Feature Importance and Effects with Dependent Features -- A Conditional Subgroup Approach

URL: http://arxiv.org/abs/2006.04628v2
Date: Mon, 21 Jun 2021 07:59:39 GMT
Title: Model-agnostic Feature Importance and Effects with Dependent Features -- A Conditional Subgroup Approach
Authors: Christoph Molnar, Gunnar K\"onig, Bernd Bischl, and Giuseppe Casalicchio
Abstract summary: We propose a new sampling mechanism for the conditional distribution based on permutations in conditional subgroups. As these subgroups are constructed using decision trees (transformation trees), the conditioning becomes inherently interpretable. We show that PFI and PDP based on conditional subgroups often outperform methods such as conditional PFI based on knockoffs.
Score: 0.7349727826230864
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The interpretation of feature importance in machine learning models is challenging when features are dependent. Permutation feature importance (PFI) ignores such dependencies, which can cause misleading interpretations due to extrapolation. A possible remedy is more advanced conditional PFI approaches that enable the assessment of feature importance conditional on all other features. Due to this shift in perspective and in order to enable correct interpretations, it is therefore important that the conditioning is transparent and humanly comprehensible. In this paper, we propose a new sampling mechanism for the conditional distribution based on permutations in conditional subgroups. As these subgroups are constructed using decision trees (transformation trees), the conditioning becomes inherently interpretable. This not only provides a simple and effective estimator of conditional PFI, but also local PFI estimates within the subgroups. In addition, we apply the conditional subgroups approach to partial dependence plots (PDP), a popular method for describing feature effects that can also suffer from extrapolation when features are dependent and interactions are present in the model. We show that PFI and PDP based on conditional subgroups often outperform methods such as conditional PFI based on knockoffs, or accumulated local effect plots. Furthermore, our approach allows for a more fine-grained interpretation of feature effects and importance within the conditional subgroups.

Related papers

An Algorithm for Identifying Interpretable Subgroups With Elevated Treatment Effects [0.0]
We introduce an algorithm for identifying interpretable subgroups with elevated treatment effects, given an estimate of individual or conditional average treatment effects (CATE)<n>Subgroups are characterized by rule sets'' -- easy-to-understand statements of the form (Condition A AND Condition B) OR (Condition C)
arXiv Detail & Related papers (2025-07-13T05:01:48Z)
Conditional Feature Importance with Generative Modeling Using Adversarial Random Forests [1.0208529247755187]
In explainable artificial intelligence (XAI), conditional feature importance assesses the impact of a feature on a prediction model's performance. Recent advancements in generative modeling can facilitate measuring conditional feature importance. This paper proposes cARFi, a method for measuring conditional feature importance through feature values sampled from ARF-estimated conditional distributions.
arXiv Detail & Related papers (2025-01-19T21:34:54Z)
Structural Entropy Guided Probabilistic Coding [52.01765333755793]
We propose a novel structural entropy-guided probabilistic coding model, named SEPC. We incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss. Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC.
arXiv Detail & Related papers (2024-12-12T00:37:53Z)
Hierarchical Bias-Driven Stratification for Interpretable Causal Effect Estimation [1.6874375111244329]
BICauseTree is an interpretable balancing method that identifies clusters where natural experiments occur locally. We evaluate the method's performance using synthetic and realistic datasets, explore its bias-interpretability tradeoff, and show that it is comparable with existing approaches.
arXiv Detail & Related papers (2024-01-31T10:58:13Z)
Variable Importance in High-Dimensional Settings Requires Grouping [19.095605415846187]
Conditional Permutation Importance (CPI) bypasses PI's limitations in such cases. Grouping variables statistically via clustering or some prior knowledge gains some power back. We show that the approach extended with stacking controls the type-I error even with highly-correlated groups.
arXiv Detail & Related papers (2023-12-18T00:21:47Z)
Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics. We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data. Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z)
Commutative Lie Group VAE for Disentanglement Learning [96.32813624341833]
We view disentanglement learning as discovering an underlying structure that equivariantly reflects the factorized variations shown in data. A simple model named Commutative Lie Group VAE is introduced to realize the group-based disentanglement learning. Experiments show that our model can effectively learn disentangled representations without supervision, and can achieve state-of-the-art performance without extra constraints.
arXiv Detail & Related papers (2021-06-07T07:03:14Z)
Deconfounding Scores: Feature Representations for Causal Effect Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation. We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data. In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z)
Transforming Feature Space to Interpret Machine Learning Models [91.62936410696409]
This contribution proposes a novel approach that interprets machine-learning models through the lens of feature space transformations. It can be used to enhance unconditional as well as conditional post-hoc diagnostic tools. A case study on remote-sensing landcover classification with 46 features is used to demonstrate the potential of the proposed approach.
arXiv Detail & Related papers (2021-04-09T10:48:11Z)
GroupifyVAE: from Group-based Definition to VAE-based Unsupervised Representation Disentanglement [91.9003001845855]
VAE-based unsupervised disentanglement can not be achieved without introducing other inductive bias. We address VAE-based unsupervised disentanglement by leveraging the constraints derived from the Group Theory based definition as the non-probabilistic inductive bias. We train 1800 models covering the most prominent VAE-based models on five datasets to verify the effectiveness of our method.
arXiv Detail & Related papers (2021-02-20T09:49:51Z)
Relative Feature Importance [1.4474137122906163]
Relative Feature Importance (RFI) is a generalization of Permutation Feature Importance (PFI) and Conditional Feature Importance (CFI) RFI allows for a more nuanced feature importance computation beyond the PFI versus CFI dichotomy. We derive general interpretation rules for RFI based on a detailed theoretical analysis of the implications of relative feature relevance.
arXiv Detail & Related papers (2020-07-16T12:20:22Z)
Controlling for sparsity in sparse factor analysis models: adaptive latent feature sharing for piecewise linear dimensionality reduction [2.896192909215469]
We propose a simple and tractable parametric feature allocation model which can address key limitations of current latent feature decomposition techniques. We derive a novel adaptive Factor analysis (aFA), as well as, an adaptive probabilistic principle component analysis (aPPCA) capable of flexible structure discovery and dimensionality reduction. We show that aPPCA and aFA can infer interpretable high level features both when applied on raw MNIST and when applied for interpreting autoencoder features.
arXiv Detail & Related papers (2020-06-22T16:09:11Z)
Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management. We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.