FACT: High-Dimensional Random Forests Inference
- URL: http://arxiv.org/abs/2207.01678v2
- Date: Mon, 13 Nov 2023 04:08:57 GMT
- Title: FACT: High-Dimensional Random Forests Inference
- Authors: Chien-Ming Chi, Yingying Fan, Jinchi Lv
- Abstract summary: Quantifying the usefulness of individual features in random forests learning can greatly enhance its interpretability.
Existing studies have shown that some popularly used feature importance measures for random forests suffer from the bias issue.
We propose a framework of the self-normalized feature-residual correlation test (FACT) for evaluating the significance of a given feature.
- Score: 4.941630596191806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantifying the usefulness of individual features in random forests learning
can greatly enhance its interpretability. Existing studies have shown that some
popularly used feature importance measures for random forests suffer from the
bias issue. In addition, there lack comprehensive size and power analyses for
most of these existing methods. In this paper, we approach the problem via
hypothesis testing, and suggest a framework of the self-normalized
feature-residual correlation test (FACT) for evaluating the significance of a
given feature in the random forests model with bias-resistance property, where
our null hypothesis concerns whether the feature is conditionally independent
of the response given all other features. Such an endeavor on random forests
inference is empowered by some recent developments on high-dimensional random
forests consistency. Under a fairly general high-dimensional nonparametric
model setting with dependent features, we formally establish that FACT can
provide theoretically justified feature importance test with controlled type I
error and enjoy appealing power property. The theoretical results and
finite-sample advantages of the newly suggested method are illustrated with
several simulation examples and an economic forecasting application.
Related papers
- Identifiable Latent Neural Causal Models [82.14087963690561]
Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data.
We determine the types of distribution shifts that do contribute to the identifiability of causal representations.
We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations.
arXiv Detail & Related papers (2024-03-23T04:13:55Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Why do Random Forests Work? Understanding Tree Ensembles as
Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles.
We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z) - Simultaneous inference for generalized linear models with unmeasured confounders [0.0]
We propose a unified statistical estimation and inference framework that harnesses structures and integrates linear projections into three key stages.
We show effective Type-I error control of $z$-tests as sample and response sizes approach infinity.
arXiv Detail & Related papers (2023-09-13T18:53:11Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Measuring Implicit Bias Using SHAP Feature Importance and Fuzzy
Cognitive Maps [1.9739269019020032]
In this paper, we integrate the concepts of feature importance with implicit bias in the context of pattern classification.
The amount of bias towards protected features might differ depending on whether the features are numerically or categorically encoded.
arXiv Detail & Related papers (2023-05-16T12:31:36Z) - Bayesian Hierarchical Models for Counterfactual Estimation [12.159830463756341]
We propose a probabilistic paradigm to estimate a diverse set of counterfactuals.
We treat the perturbations as random variables endowed with prior distribution functions.
A gradient based sampler with superior convergence characteristics efficiently computes the posterior samples.
arXiv Detail & Related papers (2023-01-21T00:21:11Z) - Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks.
Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z) - What Makes Forest-Based Heterogeneous Treatment Effect Estimators Work? [1.1050303097572156]
We show that both methods can be understood in terms of the same parameters and confounding assumptions under L2 loss.
In the randomized setting, both approaches performed akin to the new blended versions in a benchmark study.
arXiv Detail & Related papers (2022-06-21T12:45:07Z) - Uncertainty Modeling for Out-of-Distribution Generalization [56.957731893992495]
We argue that the feature statistics can be properly manipulated to improve the generalization ability of deep learning models.
Common methods often consider the feature statistics as deterministic values measured from the learned features.
We improve the network generalization ability by modeling the uncertainty of domain shifts with synthesized feature statistics during training.
arXiv Detail & Related papers (2022-02-08T16:09:12Z) - Towards Robust Classification with Deep Generative Forests [13.096855747795303]
Decision Trees and Random Forests are among the most widely used machine learning models.
Being primarily discriminative models they lack principled methods to manipulate the uncertainty of predictions.
We exploit Generative Forests (GeFs) to extend Random Forests to generative models representing the full joint distribution over the feature space.
arXiv Detail & Related papers (2020-07-11T08:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.