Factor Importance Ranking and Selection using Total Indices
- URL: http://arxiv.org/abs/2401.00800v2
- Date: Fri, 12 Jan 2024 02:38:40 GMT
- Title: Factor Importance Ranking and Selection using Total Indices
- Authors: Chaofan Huang, V. Roshan Joseph
- Abstract summary: A factor importance measure ought to characterize the feature's predictive potential without relying on a specific prediction algorithm.
We present the equivalence between predictiveness potential and total Sobol' indices from global sensitivity analysis.
We introduce a novel consistent estimator that can be directly estimated from noisy data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Factor importance measures the impact of each feature on output prediction
accuracy. Many existing works focus on the model-based importance, but an
important feature in one learning algorithm may hold little significance in
another model. Hence, a factor importance measure ought to characterize the
feature's predictive potential without relying on a specific prediction
algorithm. Such algorithm-agnostic importance is termed as intrinsic importance
in Williamson et al. (2023), but their estimator again requires model fitting.
To bypass the modeling step, we present the equivalence between predictiveness
potential and total Sobol' indices from global sensitivity analysis, and
introduce a novel consistent estimator that can be directly estimated from
noisy data. Integrating with forward selection and backward elimination gives
rise to FIRST, Factor Importance Ranking and Selection using Total (Sobol')
indices. Extensive simulations are provided to demonstrate the effectiveness of
FIRST on regression and binary classification problems, and a clear advantage
over the state-of-the-art methods.
Related papers
- A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework in Large Language Models (LLMs)
We derive novel metrics with high-probability guarantees concerning the output distribution of a model.
Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z) - Deep Probability Segmentation: Are segmentation models probability estimators? [0.7646713951724011]
We apply Calibrated Probability Estimation to segmentation tasks to evaluate its impact on model calibration.
Results indicate that while CaPE improves calibration, its effect is less pronounced compared to classification tasks.
We also investigated the influence of dataset size and bin optimization on the effectiveness of calibration.
arXiv Detail & Related papers (2024-09-19T07:52:19Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Generalizing Backpropagation for Gradient-Based Interpretability [103.2998254573497]
We show that the gradient of a model is a special case of a more general formulation using semirings.
This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics.
arXiv Detail & Related papers (2023-07-06T15:19:53Z) - Inferring feature importance with uncertainties in high-dimensional data [0.0]
We present a Shapley value based framework for inferring the importance of individual features, including uncertainty in the estimator.
We build upon the recently published feature importance measure of SAGE and introduce sub-SAGE which can be estimated without resampling for tree-based models.
arXiv Detail & Related papers (2021-09-02T11:57:34Z) - Double Robust Representation Learning for Counterfactual Prediction [68.78210173955001]
We propose a novel scalable method to learn double-robust representations for counterfactual predictions.
We make robust and efficient counterfactual predictions for both individual and average treatment effects.
The algorithm shows competitive performance with the state-of-the-art on real world and synthetic data.
arXiv Detail & Related papers (2020-10-15T16:39:26Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Reachable Sets of Classifiers and Regression Models: (Non-)Robustness
Analysis and Robust Training [1.0878040851638]
We analyze and enhance robustness properties of both classifiers and regression models.
Specifically, we verify (non-)robustness, propose a robust training procedure, and show that our approach outperforms adversarial attacks.
Second, we provide techniques to distinguish between reliable and non-reliable predictions for unlabeled inputs, to quantify the influence of each feature on a prediction, and compute a feature ranking.
arXiv Detail & Related papers (2020-07-28T10:58:06Z) - Nonparametric Feature Impact and Importance [0.6123324869194193]
We give mathematical definitions of feature impact and importance, derived from partial dependence curves, that operate directly on the data.
To assess quality, we show that features ranked by these definitions are competitive with existing feature selection techniques.
arXiv Detail & Related papers (2020-06-08T17:07:35Z) - A general framework for inference on algorithm-agnostic variable
importance [3.441021278275805]
We propose a framework for non inference on interpretable algorithm-agnostic variable importance.
We show that our proposal has good operating characteristics, and we illustrate it with data from a study of an antibody against HIV-1 infection.
arXiv Detail & Related papers (2020-04-07T20:09:21Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.