Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a
Random Forest
- URL: http://arxiv.org/abs/2103.16700v1
- Date: Tue, 30 Mar 2021 21:57:55 GMT
- Title: Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a
Random Forest
- Authors: Siyu Zhou and Lucas Mentch
- Abstract summary: We argue that tree depth should be seen as a natural form of regularization across the entire procedure.
In particular, our work suggests that random forests with shallow trees are advantageous when the signal-to-noise ratio in the data is low.
- Score: 8.513154770491898
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to their long-standing reputation as excellent off-the-shelf predictors,
random forests continue remain a go-to model of choice for applied
statisticians and data scientists. Despite their widespread use, however, until
recently, little was known about their inner-workings and about which aspects
of the procedure were driving their success. Very recently, two competing
hypotheses have emerged -- one based on interpolation and the other based on
regularization. This work argues in favor of the latter by utilizing the
regularization framework to reexamine the decades-old question of whether
individual trees in an ensemble ought to be pruned. Despite the fact that
default constructions of random forests use near full depth trees in most
popular software packages, here we provide strong evidence that tree depth
should be seen as a natural form of regularization across the entire procedure.
In particular, our work suggests that random forests with shallow trees are
advantageous when the signal-to-noise ratio in the data is low. In building up
this argument, we also critique the newly popular notion of "double descent" in
random forests by drawing parallels to U-statistics and arguing that the
noticeable jumps in random forest accuracy are the result of simple averaging
rather than interpolation.
Related papers
- Ensembles of Probabilistic Regression Trees [46.53457774230618]
Tree-based ensemble methods have been successfully used for regression problems in many applications and research studies.
We study ensemble versions of probabilisticregression trees that provide smooth approximations of the objective function by assigningeach observation to each region with respect to a probability distribution.
arXiv Detail & Related papers (2024-06-20T06:51:51Z) - Hidden Variables unseen by Random Forests [0.3749861135832073]
We argue that simple alternative partitioning schemes used in the tree growing procedure can enhance identification of these interactions.
Our results validate that the modifications considered enhance the model's fitting ability in scenarios where pure interactions play a crucial role.
arXiv Detail & Related papers (2024-06-19T12:07:22Z) - Randomization Can Reduce Both Bias and Variance: A Case Study in Random
Forests [19.553278430819308]
We study the often overlooked phenomenon, first noted in citebreiman2001random, that random forests appear to reduce bias compared to bagging.
arXiv Detail & Related papers (2024-02-20T02:36:26Z) - Why do Random Forests Work? Understanding Tree Ensembles as
Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles.
We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z) - Improving the Accuracy and Interpretability of Random Forests via Forest
Pruning [0.0]
We propose a post-hoc approach that aims to have the best of both worlds: the accuracy of random forests and the interpretability of decision trees.
We present two forest-pruning methods to find an optimal sub-forest within a given random forest, and then, when applicable, combine the selected trees into one.
arXiv Detail & Related papers (2024-01-10T20:02:47Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Contextual Decision Trees [62.997667081978825]
We propose a multi-armed contextual bandit recommendation framework for feature-based selection of a single shallow tree of the learned ensemble.
The trained system, which works on top of the Random Forest, dynamically identifies a base predictor that is responsible for providing the final output.
arXiv Detail & Related papers (2022-07-13T17:05:08Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - An Efficient Adversarial Attack for Tree Ensembles [91.05779257472675]
adversarial attacks on tree based ensembles such as gradient boosting decision trees (DTs) and random forests (RFs)
We show that our method can be thousands of times faster than the previous mixed-integer linear programming (MILP) based approach.
Our code is available at https://chong-z/tree-ensemble-attack.
arXiv Detail & Related papers (2020-10-22T10:59:49Z) - Fr\'echet random forests for metric space valued regression with non
euclidean predictors [0.0]
We introduce Fr'echet trees and Fr'echet random forests, which allow to handle data for which input and output variables take values in general metric spaces.
A consistency theorem for Fr'echet regressogram predictor using data-driven partitions is given and applied to Fr'echet purely uniformly random trees.
arXiv Detail & Related papers (2019-06-04T22:07:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.