TRIP: A Nonparametric Test to Diagnose Biased Feature Importance Scores
- URL: http://arxiv.org/abs/2507.07276v1
- Date: Wed, 09 Jul 2025 20:49:10 GMT
- Title: TRIP: A Nonparametric Test to Diagnose Biased Feature Importance Scores
- Authors: Aaron Foote, Danny Krizanc,
- Abstract summary: TRIP is a test requiring minimal assumptions that is able to detect unreliable permutation feature importance scores.<n>Our results show that the test can be used to reliably detect when permutation feature importance scores are unreliable.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Along with accurate prediction, understanding the contribution of each feature to the making of the prediction, i.e., the importance of the feature, is a desirable and arguably necessary component of a machine learning model. For a complex model such as a random forest, such importances are not innate -- as they are, e.g., with linear regression. Efficient methods have been created to provide such capabilities, with one of the most popular among them being permutation feature importance due to its efficiency, model-agnostic nature, and perceived intuitiveness. However, permutation feature importance has been shown to be misleading in the presence of dependent features as a result of the creation of unrealistic observations when permuting the dependent features. In this work, we develop TRIP (Test for Reliable Interpretation via Permutation), a test requiring minimal assumptions that is able to detect unreliable permutation feature importance scores that are the result of model extrapolation. To build on this, we demonstrate how the test can be complemented in order to allow its use in high dimensional settings. Through testing on simulated data and applications, our results show that the test can be used to reliably detect when permutation feature importance scores are unreliable.
Related papers
- AICO: Feature Significance Tests for Supervised Learning [0.5142666700569699]
This paper develops model- and distribution-agnostic significance tests to assess the influence of input features in any regression or classification algorithm.<n>We construct a uniformly most powerful, randomized sign test for this median, yielding exact p-values for assessing feature significance and confidence intervals.<n>Experiments on synthetic tasks validate its statistical and computational advantages, and applications to real-world data illustrate its practical utility.
arXiv Detail & Related papers (2025-06-29T21:15:40Z) - Conditional Feature Importance with Generative Modeling Using Adversarial Random Forests [1.0208529247755187]
In explainable artificial intelligence (XAI), conditional feature importance assesses the impact of a feature on a prediction model's performance.<n>Recent advancements in generative modeling can facilitate measuring conditional feature importance.<n>This paper proposes cARFi, a method for measuring conditional feature importance through feature values sampled from ARF-estimated conditional distributions.
arXiv Detail & Related papers (2025-01-19T21:34:54Z) - Statistical Test for Auto Feature Engineering by Selective Inference [12.703556860454565]
Auto Feature Engineering (AFE) plays a crucial role in developing practical machine learning pipelines.
We propose a new statistical test for generated features by AFE algorithms based on a framework called selective inference.
The proposed test can quantify the statistical significance of the generated features in the form of $p$-values, enabling theoretically guaranteed control of the risk of false findings.
arXiv Detail & Related papers (2024-10-13T12:26:51Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Agree to Disagree: Diversity through Disagreement for Better
Transferability [54.308327969778155]
We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data.
We show how D-BAT naturally emerges from the notion of generalized discrepancy.
arXiv Detail & Related papers (2022-02-09T12:03:02Z) - Conformal Prediction Under Feedback Covariate Shift for Biomolecular Design [56.86533144730384]
We introduce a method to quantify predictive uncertainty in settings where the training and test data are statistically dependent.<n>As a motivating use case, we demonstrate with several real data sets how our method quantifies uncertainty for the predicted fitness of designed proteins.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Comparing interpretability and explainability for feature selection [0.6015898117103068]
We investigate the performance of variable importance as a feature selection method across various black-box and interpretable machine learning methods.
The results show that regardless of whether we use the native variable importance method or SHAP, XGBoost fails to clearly distinguish between relevant and irrelevant features.
arXiv Detail & Related papers (2021-05-11T20:01:23Z) - Removing Spurious Features can Hurt Accuracy and Affect Groups
Disproportionately [83.68135652247496]
A natural remedy is to remove spurious features from the model.
We show that removal of spurious features can decrease accuracy due to inductive biases.
We also show that robust self-training can remove spurious features without affecting the overall accuracy.
arXiv Detail & Related papers (2020-12-07T23:08:59Z) - Towards a More Reliable Interpretation of Machine Learning Outputs for
Safety-Critical Systems using Feature Importance Fusion [0.0]
We introduce a novel fusion metric and compare it to the state-of-the-art.
Our approach is tested on synthetic data, where the ground truth is known.
Results show that our feature importance ensemble Framework overall produces 15% less feature importance error compared to existing methods.
arXiv Detail & Related papers (2020-09-11T15:51:52Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.