Nonparametric Feature Impact and Importance
- URL: http://arxiv.org/abs/2006.04750v1
- Date: Mon, 8 Jun 2020 17:07:35 GMT
- Title: Nonparametric Feature Impact and Importance
- Authors: Terence Parr, James D. Wilson, Jeff Hamrick
- Abstract summary: We give mathematical definitions of feature impact and importance, derived from partial dependence curves, that operate directly on the data.
To assess quality, we show that features ranked by these definitions are competitive with existing feature selection techniques.
- Score: 0.6123324869194193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Practitioners use feature importance to rank and eliminate weak predictors
during model development in an effort to simplify models and improve
generality. Unfortunately, they also routinely conflate such feature importance
measures with feature impact, the isolated effect of an explanatory variable on
the response variable. This can lead to real-world consequences when importance
is inappropriately interpreted as impact for business or medical insight
purposes. The dominant approach for computing importances is through
interrogation of a fitted model, which works well for feature selection, but
gives distorted measures of feature impact. The same method applied to the same
data set can yield different feature importances, depending on the model,
leading us to conclude that impact should be computed directly from the data.
While there are nonparametric feature selection algorithms, they typically
provide feature rankings, rather than measures of impact or importance. They
also typically focus on single-variable associations with the response. In this
paper, we give mathematical definitions of feature impact and importance,
derived from partial dependence curves, that operate directly on the data. To
assess quality, we show that features ranked by these definitions are
competitive with existing feature selection techniques using three real data
sets for predictive tasks.
Related papers
- Word Matters: What Influences Domain Adaptation in Summarization? [43.7010491942323]
This paper investigates the fine-grained factors affecting domain adaptation performance.
We propose quantifying dataset learning difficulty as the learning difficulty of generative summarization.
Our experiments conclude that, when considering dataset learning difficulty, the cross-domain overlap and the performance gain in summarization tasks exhibit an approximate linear relationship.
arXiv Detail & Related papers (2024-06-21T02:15:49Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - A Notion of Feature Importance by Decorrelation and Detection of Trends
by Random Forest Regression [1.675857332621569]
We introduce a novel notion of feature importance based on the well-studied Gram-Schmidt decorrelation method.
We propose two estimators for identifying trends in the data using random forest regression.
arXiv Detail & Related papers (2023-03-02T11:01:49Z) - Striving for data-model efficiency: Identifying data externalities on
group performance [75.17591306911015]
Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance.
We focus on a particular type of data-model inefficiency, in which adding training data from some sources can actually lower performance evaluated on key sub-groups of the population.
Our results indicate that data-efficiency is a key component of both accurate and trustworthy machine learning.
arXiv Detail & Related papers (2022-11-11T16:48:27Z) - The Invariant Ground Truth of Affect [2.570570340104555]
Ground truth of affect is attributed to the affect labels which inadvertently include biases inherent to the subjective nature of emotion and its labeling.
This paper reframes the ways one may obtain a reliable ground truth of affect by transferring aspects of causation theory to affective computing.
We employ causation inspired methods for detecting outliers in affective corpora and building affect models that are robust across participants and tasks.
arXiv Detail & Related papers (2022-10-14T08:26:01Z) - Feature Selection for Discovering Distributional Treatment Effect
Modifiers [37.09619678733784]
We propose a framework for finding features relevant to the difference in treatment effects.
We derive a feature importance measure that quantifies how strongly the feature attributes influence the discrepancy between potential outcome distributions.
We then develop a feature selection algorithm that can control the type I error rate to the desired level.
arXiv Detail & Related papers (2022-06-01T14:25:32Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Information Theoretic Measures for Fairness-aware Feature Selection [27.06618125828978]
We develop a framework for fairness-aware feature selection, based on information theoretic measures for the accuracy and discriminatory impacts of features.
Specifically, our goal is to design a fairness utility score for each feature which quantifies how this feature influences accurate as well as nondiscriminatory decisions.
arXiv Detail & Related papers (2021-06-01T20:11:54Z) - Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions.
influence estimates are fairly accurate for shallow networks.
Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z) - Explaining Black Box Predictions and Unveiling Data Artifacts through
Influence Functions [55.660255727031725]
Influence functions explain the decisions of a model by identifying influential training examples.
We conduct a comparison between influence functions and common word-saliency methods on representative tasks.
We develop a new measure based on influence functions that can reveal artifacts in training data.
arXiv Detail & Related papers (2020-05-14T00:45:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.