Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests
- URL: http://arxiv.org/abs/2402.12668v2
- Date: Sun, 4 Aug 2024 18:07:51 GMT
- Title: Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests
- Authors: Brian Liu, Rahul Mazumder,
- Abstract summary: We study the often overlooked phenomenon, first noted in citebreiman2001random, that random forests appear to reduce bias compared to bagging.
- Score: 16.55139316146852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the often overlooked phenomenon, first noted in \cite{breiman2001random}, that random forests appear to reduce bias compared to bagging. Motivated by an interesting paper by \cite{mentch2020randomization}, where the authors argue that random forests reduce effective degrees of freedom and only outperform bagging ensembles in low signal-to-noise ratio (SNR) settings, we explore how random forests can uncover patterns in the data missed by bagging. We empirically demonstrate that in the presence of such patterns, random forests reduce bias along with variance and increasingly outperform bagging ensembles when SNR is high. Our observations offer insights into the real-world success of random forests across a range of SNRs and enhance our understanding of the difference between random forests and bagging ensembles with respect to the randomization injected into each split. Our investigations also yield practical insights into the importance of tuning $mtry$ in random forests.
Related papers
- Ensembles of Probabilistic Regression Trees [46.53457774230618]
Tree-based ensemble methods have been successfully used for regression problems in many applications and research studies.
We study ensemble versions of probabilisticregression trees that provide smooth approximations of the objective function by assigningeach observation to each region with respect to a probability distribution.
arXiv Detail & Related papers (2024-06-20T06:51:51Z) - Grafting: Making Random Forests Consistent [0.0]
Little is known about the theory of Random Forests.
A major unanswered question is whether, or when, the Random Forest algorithm is consistent.
arXiv Detail & Related papers (2024-03-09T21:29:25Z) - Why do Random Forests Work? Understanding Tree Ensembles as
Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles.
We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z) - Neuroevolution-based Classifiers for Deforestation Detection in Tropical
Forests [62.997667081978825]
Millions of hectares of tropical forests are lost every year due to deforestation or degradation.
Monitoring and deforestation detection programs are in use, in addition to public policies for the prevention and punishment of criminals.
This paper proposes the use of pattern classifiers based on neuroevolution technique (NEAT) in tropical forest deforestation detection tasks.
arXiv Detail & Related papers (2022-08-23T16:04:12Z) - Contextual Decision Trees [62.997667081978825]
We propose a multi-armed contextual bandit recommendation framework for feature-based selection of a single shallow tree of the learned ensemble.
The trained system, which works on top of the Random Forest, dynamically identifies a base predictor that is responsible for providing the final output.
arXiv Detail & Related papers (2022-07-13T17:05:08Z) - Geometry- and Accuracy-Preserving Random Forest Proximities [3.265773263570237]
We introduce a novel definition of random forest proximities called Random Forest-Geometry- and Accuracy-Preserving proximities (RF-GAP)
We prove that RF-GAP exactly match the out-of-bag random forest prediction, thus capturing the data geometry learned by the random forest.
This improved geometric representation outperforms traditional random forest proximities in tasks such as data imputation and provides outlier detection and visualization results consistent with the learned data geometry.
arXiv Detail & Related papers (2022-01-29T23:13:53Z) - Minimax Rates for High-Dimensional Random Tessellation Forests [0.0]
Mondrian forests is the first class of random forests for which minimax rates were obtained in arbitrary dimension.
We show that a large class of random forests with general split directions also achieve minimax optimal convergence rates in arbitrary dimension.
arXiv Detail & Related papers (2021-09-22T06:47:38Z) - Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a
Random Forest [8.513154770491898]
We argue that tree depth should be seen as a natural form of regularization across the entire procedure.
In particular, our work suggests that random forests with shallow trees are advantageous when the signal-to-noise ratio in the data is low.
arXiv Detail & Related papers (2021-03-30T21:57:55Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - Getting Better from Worse: Augmented Bagging and a Cautionary Tale of
Variable Importance [6.327756363397825]
Black-box learning algorithms can provide accurate predictions with minimal a priori model specifications.
AugBagg is a procedure that operates in an identical fashion to classical bagging and random forests.
We demonstrate that this simple act of including extra noise variables in the model can lead to dramatic improvements in out-of-sample predictive accuracy.
arXiv Detail & Related papers (2020-03-07T18:35:10Z) - Censored Quantile Regression Forest [81.9098291337097]
We develop a new estimating equation that adapts to censoring and leads to quantile score whenever the data do not exhibit censoring.
The proposed procedure named it censored quantile regression forest, allows us to estimate quantiles of time-to-event without any parametric modeling assumption.
arXiv Detail & Related papers (2020-01-08T23:20:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.