Exogenous Randomness Empowering Random Forests
- URL: http://arxiv.org/abs/2411.07554v1
- Date: Tue, 12 Nov 2024 05:06:10 GMT
- Title: Exogenous Randomness Empowering Random Forests
- Authors: Tianxing Mei, Yingying Fan, Jinchi Lv,
- Abstract summary: We develop non-asymptotic expansions for the mean squared error (MSE) for both individual trees and forests.
Our findings unveil that feature subsampling reduces both the bias and variance of random forests compared to individual trees.
Our results reveal an intriguing phenomenon: the presence of noise features can act as a "blessing" in enhancing the performance of random forests.
- Score: 4.396860522241306
- License:
- Abstract: We offer theoretical and empirical insights into the impact of exogenous randomness on the effectiveness of random forests with tree-building rules independent of training data. We formally introduce the concept of exogenous randomness and identify two types of commonly existing randomness: Type I from feature subsampling, and Type II from tie-breaking in tree-building processes. We develop non-asymptotic expansions for the mean squared error (MSE) for both individual trees and forests and establish sufficient and necessary conditions for their consistency. In the special example of the linear regression model with independent features, our MSE expansions are more explicit, providing more understanding of the random forests' mechanisms. It also allows us to derive an upper bound on the MSE with explicit consistency rates for trees and forests. Guided by our theoretical findings, we conduct simulations to further explore how exogenous randomness enhances random forest performance. Our findings unveil that feature subsampling reduces both the bias and variance of random forests compared to individual trees, serving as an adaptive mechanism to balance bias and variance. Furthermore, our results reveal an intriguing phenomenon: the presence of noise features can act as a "blessing" in enhancing the performance of random forests thanks to feature subsampling.
Related papers
- Ensembles of Probabilistic Regression Trees [46.53457774230618]
Tree-based ensemble methods have been successfully used for regression problems in many applications and research studies.
We study ensemble versions of probabilisticregression trees that provide smooth approximations of the objective function by assigningeach observation to each region with respect to a probability distribution.
arXiv Detail & Related papers (2024-06-20T06:51:51Z) - Identifiable Latent Neural Causal Models [82.14087963690561]
Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data.
We determine the types of distribution shifts that do contribute to the identifiability of causal representations.
We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations.
arXiv Detail & Related papers (2024-03-23T04:13:55Z) - Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests [16.55139316146852]
We study the often overlooked phenomenon, first noted in citebreiman2001random, that random forests appear to reduce bias compared to bagging.
arXiv Detail & Related papers (2024-02-20T02:36:26Z) - Why do Random Forests Work? Understanding Tree Ensembles as
Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles.
We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z) - On the Pointwise Behavior of Recursive Partitioning and Its Implications
for Heterogeneous Causal Effect Estimation [8.394633341978007]
Decision tree learning is increasingly being used for pointwise inference.
We show that adaptive decision trees can fail to achieve convergence rates of convergence in the norm with non-vanishing probability.
We show that random forests can remedy the situation, turning poor performing trees into nearly optimal procedures.
arXiv Detail & Related papers (2022-11-19T21:28:30Z) - Contextual Decision Trees [62.997667081978825]
We propose a multi-armed contextual bandit recommendation framework for feature-based selection of a single shallow tree of the learned ensemble.
The trained system, which works on top of the Random Forest, dynamically identifies a base predictor that is responsible for providing the final output.
arXiv Detail & Related papers (2022-07-13T17:05:08Z) - FACT: High-Dimensional Random Forests Inference [4.941630596191806]
Quantifying the usefulness of individual features in random forests learning can greatly enhance its interpretability.
Existing studies have shown that some popularly used feature importance measures for random forests suffer from the bias issue.
We propose a framework of the self-normalized feature-residual correlation test (FACT) for evaluating the significance of a given feature.
arXiv Detail & Related papers (2022-07-04T19:05:08Z) - What Makes Forest-Based Heterogeneous Treatment Effect Estimators Work? [1.1050303097572156]
We show that both methods can be understood in terms of the same parameters and confounding assumptions under L2 loss.
In the randomized setting, both approaches performed akin to the new blended versions in a benchmark study.
arXiv Detail & Related papers (2022-06-21T12:45:07Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - The Hidden Uncertainty in a Neural Networks Activations [105.4223982696279]
The distribution of a neural network's latent representations has been successfully used to detect out-of-distribution (OOD) data.
This work investigates whether this distribution correlates with a model's epistemic uncertainty, thus indicating its ability to generalise to novel inputs.
arXiv Detail & Related papers (2020-12-05T17:30:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.