Related papers: Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests

Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests

URL: http://arxiv.org/abs/2402.12668v2
Date: Sun, 4 Aug 2024 18:07:51 GMT
Title: Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests
Authors: Brian Liu, Rahul Mazumder,
Abstract summary: We study the often overlooked phenomenon, first noted in citebreiman2001random, that random forests appear to reduce bias compared to bagging.
Score: 16.55139316146852
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the often overlooked phenomenon, first noted in \cite{breiman2001random}, that random forests appear to reduce bias compared to bagging. Motivated by an interesting paper by \cite{mentch2020randomization}, where the authors argue that random forests reduce effective degrees of freedom and only outperform bagging ensembles in low signal-to-noise ratio (SNR) settings, we explore how random forests can uncover patterns in the data missed by bagging. We empirically demonstrate that in the presence of such patterns, random forests reduce bias along with variance and increasingly outperform bagging ensembles when SNR is high. Our observations offer insights into the real-world success of random forests across a range of SNRs and enhance our understanding of the difference between random forests and bagging ensembles with respect to the randomization injected into each split. Our investigations also yield practical insights into the importance of tuning $mtry$ in random forests.

Related papers

Spatial Reasoning with Denoising Models [49.83744014336816]
We introduce a framework to perform reasoning over sets of continuous variables via denoising generative models. We demonstrate for the first time, that order of generation can successfully be predicted by the denoising network itself.
arXiv Detail & Related papers (2025-02-28T14:08:30Z)
Exogenous Randomness Empowering Random Forests [4.396860522241306]
We develop non-asymptotic expansions for the mean squared error (MSE) for both individual trees and forests. Our findings unveil that feature subsampling reduces both the bias and variance of random forests compared to individual trees. Our results reveal an intriguing phenomenon: the presence of noise features can act as a "blessing" in enhancing the performance of random forests.
arXiv Detail & Related papers (2024-11-12T05:06:10Z)
Ensembles of Probabilistic Regression Trees [46.53457774230618]
Tree-based ensemble methods have been successfully used for regression problems in many applications and research studies. We study ensemble versions of probabilisticregression trees that provide smooth approximations of the objective function by assigningeach observation to each region with respect to a probability distribution.
arXiv Detail & Related papers (2024-06-20T06:51:51Z)
Grafting: Making Random Forests Consistent [0.0]
Little is known about the theory of Random Forests. A major unanswered question is whether, or when, the Random Forest algorithm is consistent.
arXiv Detail & Related papers (2024-03-09T21:29:25Z)
Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles. We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z)
Numerically assisted determination of local models in network scenarios [55.2480439325792]
We develop a numerical tool for finding explicit local models that reproduce a given statistical behaviour. We provide conjectures for the critical visibilities of the Greenberger-Horne-Zeilinger (GHZ) and W distributions. The developed codes and documentation are publicly available at281.com/mariofilho/localmodels.
arXiv Detail & Related papers (2023-03-17T13:24:04Z)
Neuroevolution-based Classifiers for Deforestation Detection in Tropical Forests [62.997667081978825]
Millions of hectares of tropical forests are lost every year due to deforestation or degradation. Monitoring and deforestation detection programs are in use, in addition to public policies for the prevention and punishment of criminals. This paper proposes the use of pattern classifiers based on neuroevolution technique (NEAT) in tropical forest deforestation detection tasks.
arXiv Detail & Related papers (2022-08-23T16:04:12Z)
Contextual Decision Trees [62.997667081978825]
We propose a multi-armed contextual bandit recommendation framework for feature-based selection of a single shallow tree of the learned ensemble. The trained system, which works on top of the Random Forest, dynamically identifies a base predictor that is responsible for providing the final output.
arXiv Detail & Related papers (2022-07-13T17:05:08Z)
Geometry- and Accuracy-Preserving Random Forest Proximities [3.265773263570237]
We introduce a novel definition of random forest proximities called Random Forest-Geometry- and Accuracy-Preserving proximities (RF-GAP) We prove that RF-GAP exactly match the out-of-bag random forest prediction, thus capturing the data geometry learned by the random forest. This improved geometric representation outperforms traditional random forest proximities in tasks such as data imputation and provides outlier detection and visualization results consistent with the learned data geometry.
arXiv Detail & Related papers (2022-01-29T23:13:53Z)
Minimax Rates for High-Dimensional Random Tessellation Forests [0.0]
Mondrian forests is the first class of random forests for which minimax rates were obtained in arbitrary dimension. We show that a large class of random forests with general split directions also achieve minimax optimal convergence rates in arbitrary dimension.
arXiv Detail & Related papers (2021-09-22T06:47:38Z)
Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a Random Forest [8.513154770491898]
We argue that tree depth should be seen as a natural form of regularization across the entire procedure. In particular, our work suggests that random forests with shallow trees are advantageous when the signal-to-noise ratio in the data is low.
arXiv Detail & Related papers (2021-03-30T21:57:55Z)
Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency) Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z)
GANs with Variational Entropy Regularizers: Applications in Mitigating the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples. GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution. We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z)
Getting Better from Worse: Augmented Bagging and a Cautionary Tale of Variable Importance [6.327756363397825]
Black-box learning algorithms can provide accurate predictions with minimal a priori model specifications. AugBagg is a procedure that operates in an identical fashion to classical bagging and random forests. We demonstrate that this simple act of including extra noise variables in the model can lead to dramatic improvements in out-of-sample predictive accuracy.
arXiv Detail & Related papers (2020-03-07T18:35:10Z)
Censored Quantile Regression Forest [81.9098291337097]
We develop a new estimating equation that adapts to censoring and leads to quantile score whenever the data do not exhibit censoring. The proposed procedure named it censored quantile regression forest, allows us to estimate quantiles of time-to-event without any parametric modeling assumption.
arXiv Detail & Related papers (2020-01-08T23:20:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.