Model-agnostic Feature Importance and Effects with Dependent Features --
A Conditional Subgroup Approach
- URL: http://arxiv.org/abs/2006.04628v2
- Date: Mon, 21 Jun 2021 07:59:39 GMT
- Title: Model-agnostic Feature Importance and Effects with Dependent Features --
A Conditional Subgroup Approach
- Authors: Christoph Molnar, Gunnar K\"onig, Bernd Bischl, and Giuseppe
Casalicchio
- Abstract summary: We propose a new sampling mechanism for the conditional distribution based on permutations in conditional subgroups.
As these subgroups are constructed using decision trees (transformation trees), the conditioning becomes inherently interpretable.
We show that PFI and PDP based on conditional subgroups often outperform methods such as conditional PFI based on knockoffs.
- Score: 0.7349727826230864
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The interpretation of feature importance in machine learning models is
challenging when features are dependent. Permutation feature importance (PFI)
ignores such dependencies, which can cause misleading interpretations due to
extrapolation. A possible remedy is more advanced conditional PFI approaches
that enable the assessment of feature importance conditional on all other
features. Due to this shift in perspective and in order to enable correct
interpretations, it is therefore important that the conditioning is transparent
and humanly comprehensible. In this paper, we propose a new sampling mechanism
for the conditional distribution based on permutations in conditional
subgroups. As these subgroups are constructed using decision trees
(transformation trees), the conditioning becomes inherently interpretable. This
not only provides a simple and effective estimator of conditional PFI, but also
local PFI estimates within the subgroups. In addition, we apply the conditional
subgroups approach to partial dependence plots (PDP), a popular method for
describing feature effects that can also suffer from extrapolation when
features are dependent and interactions are present in the model. We show that
PFI and PDP based on conditional subgroups often outperform methods such as
conditional PFI based on knockoffs, or accumulated local effect plots.
Furthermore, our approach allows for a more fine-grained interpretation of
feature effects and importance within the conditional subgroups.
Related papers
- An Algorithm for Identifying Interpretable Subgroups With Elevated Treatment Effects [0.0]
We introduce an algorithm for identifying interpretable subgroups with elevated treatment effects, given an estimate of individual or conditional average treatment effects (CATE)<n>Subgroups are characterized by rule sets'' -- easy-to-understand statements of the form (Condition A AND Condition B) OR (Condition C)
arXiv Detail & Related papers (2025-07-13T05:01:48Z) - Conditional Feature Importance with Generative Modeling Using Adversarial Random Forests [1.0208529247755187]
In explainable artificial intelligence (XAI), conditional feature importance assesses the impact of a feature on a prediction model's performance.
Recent advancements in generative modeling can facilitate measuring conditional feature importance.
This paper proposes cARFi, a method for measuring conditional feature importance through feature values sampled from ARF-estimated conditional distributions.
arXiv Detail & Related papers (2025-01-19T21:34:54Z) - Structural Entropy Guided Probabilistic Coding [52.01765333755793]
We propose a novel structural entropy-guided probabilistic coding model, named SEPC.
We incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss.
Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC.
arXiv Detail & Related papers (2024-12-12T00:37:53Z) - Hierarchical Bias-Driven Stratification for Interpretable Causal Effect
Estimation [1.6874375111244329]
BICauseTree is an interpretable balancing method that identifies clusters where natural experiments occur locally.
We evaluate the method's performance using synthetic and realistic datasets, explore its bias-interpretability tradeoff, and show that it is comparable with existing approaches.
arXiv Detail & Related papers (2024-01-31T10:58:13Z) - Variable Importance in High-Dimensional Settings Requires Grouping [19.095605415846187]
Conditional Permutation Importance (CPI) bypasses PI's limitations in such cases.
Grouping variables statistically via clustering or some prior knowledge gains some power back.
We show that the approach extended with stacking controls the type-I error even with highly-correlated groups.
arXiv Detail & Related papers (2023-12-18T00:21:47Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Commutative Lie Group VAE for Disentanglement Learning [96.32813624341833]
We view disentanglement learning as discovering an underlying structure that equivariantly reflects the factorized variations shown in data.
A simple model named Commutative Lie Group VAE is introduced to realize the group-based disentanglement learning.
Experiments show that our model can effectively learn disentangled representations without supervision, and can achieve state-of-the-art performance without extra constraints.
arXiv Detail & Related papers (2021-06-07T07:03:14Z) - Deconfounding Scores: Feature Representations for Causal Effect
Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation.
We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data.
In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z) - Transforming Feature Space to Interpret Machine Learning Models [91.62936410696409]
This contribution proposes a novel approach that interprets machine-learning models through the lens of feature space transformations.
It can be used to enhance unconditional as well as conditional post-hoc diagnostic tools.
A case study on remote-sensing landcover classification with 46 features is used to demonstrate the potential of the proposed approach.
arXiv Detail & Related papers (2021-04-09T10:48:11Z) - GroupifyVAE: from Group-based Definition to VAE-based Unsupervised
Representation Disentanglement [91.9003001845855]
VAE-based unsupervised disentanglement can not be achieved without introducing other inductive bias.
We address VAE-based unsupervised disentanglement by leveraging the constraints derived from the Group Theory based definition as the non-probabilistic inductive bias.
We train 1800 models covering the most prominent VAE-based models on five datasets to verify the effectiveness of our method.
arXiv Detail & Related papers (2021-02-20T09:49:51Z) - Relative Feature Importance [1.4474137122906163]
Relative Feature Importance (RFI) is a generalization of Permutation Feature Importance (PFI) and Conditional Feature Importance (CFI)
RFI allows for a more nuanced feature importance computation beyond the PFI versus CFI dichotomy.
We derive general interpretation rules for RFI based on a detailed theoretical analysis of the implications of relative feature relevance.
arXiv Detail & Related papers (2020-07-16T12:20:22Z) - Controlling for sparsity in sparse factor analysis models: adaptive
latent feature sharing for piecewise linear dimensionality reduction [2.896192909215469]
We propose a simple and tractable parametric feature allocation model which can address key limitations of current latent feature decomposition techniques.
We derive a novel adaptive Factor analysis (aFA), as well as, an adaptive probabilistic principle component analysis (aPPCA) capable of flexible structure discovery and dimensionality reduction.
We show that aPPCA and aFA can infer interpretable high level features both when applied on raw MNIST and when applied for interpreting autoencoder features.
arXiv Detail & Related papers (2020-06-22T16:09:11Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.