Alpha-Trimming: Locally Adaptive Tree Pruning for Random Forests
- URL: http://arxiv.org/abs/2408.07151v1
- Date: Tue, 13 Aug 2024 18:41:09 GMT
- Title: Alpha-Trimming: Locally Adaptive Tree Pruning for Random Forests
- Authors: Nikola Surjanovic, Andrew Henrey, Thomas M. Loughin,
- Abstract summary: A fast pruning algorithm, alpha-trimming, is proposed as an effective approach to pruning trees within a random forest.
A remarkable feature of alpha-trimming is that its tuning parameter can be adjusted without refitting the trees in the random forest once the trees have been fully grown once.
- Score: 0.8192907805418583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We demonstrate that adaptively controlling the size of individual regression trees in a random forest can improve predictive performance, contrary to the conventional wisdom that trees should be fully grown. A fast pruning algorithm, alpha-trimming, is proposed as an effective approach to pruning trees within a random forest, where more aggressive pruning is performed in regions with a low signal-to-noise ratio. The amount of overall pruning is controlled by adjusting the weight on an information criterion penalty as a tuning parameter, with the standard random forest being a special case of our alpha-trimmed random forest. A remarkable feature of alpha-trimming is that its tuning parameter can be adjusted without refitting the trees in the random forest once the trees have been fully grown once. In a benchmark suite of 46 example data sets, mean squared prediction error is often substantially lowered by using our pruning algorithm and is never substantially increased compared to a random forest with fully-grown trees at default parameter settings.
Related papers
- Can a Single Tree Outperform an Entire Forest? [5.448070998907116]
The prevailing mindset is that a single decision tree underperforms classic random forests in testing accuracy.
This study challenges such a mindset by significantly improving the testing accuracy of an oblique regression tree.
Our approach reformulates tree training as a differentiable unconstrained optimization task.
arXiv Detail & Related papers (2024-11-26T00:18:18Z) - Exogenous Randomness Empowering Random Forests [4.396860522241306]
We develop non-asymptotic expansions for the mean squared error (MSE) for both individual trees and forests.
Our findings unveil that feature subsampling reduces both the bias and variance of random forests compared to individual trees.
Our results reveal an intriguing phenomenon: the presence of noise features can act as a "blessing" in enhancing the performance of random forests.
arXiv Detail & Related papers (2024-11-12T05:06:10Z) - Forecasting with Hyper-Trees [50.72190208487953]
Hyper-Trees are designed to learn the parameters of time series models.
By relating the parameters of a target time series model to features, Hyper-Trees also address the issue of parameter non-stationarity.
In this novel approach, the trees first generate informative representations from the input features, which a shallow network then maps to the target model parameters.
arXiv Detail & Related papers (2024-05-13T15:22:15Z) - Adaptive Split Balancing for Optimal Random Forest [8.916614661563893]
We propose a new random forest algorithm that constructs the trees using a novel adaptive split-balancing method.
Our method achieves optimality in simple, smooth scenarios while adaptively learning the tree structure from the data.
arXiv Detail & Related papers (2024-02-17T09:10:40Z) - Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees.
We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - Why do Random Forests Work? Understanding Tree Ensembles as
Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles.
We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z) - Accelerating Generalized Random Forests with Fixed-Point Trees [2.810283834703862]
Estimators are constructed by leveraging random forests as an adaptive kernel weighting algorithm.
We propose a new tree-growing rule for generalized random forests induced from a fixed-point iteration type of approximation.
arXiv Detail & Related papers (2023-06-20T21:45:35Z) - Contextual Decision Trees [62.997667081978825]
We propose a multi-armed contextual bandit recommendation framework for feature-based selection of a single shallow tree of the learned ensemble.
The trained system, which works on top of the Random Forest, dynamically identifies a base predictor that is responsible for providing the final output.
arXiv Detail & Related papers (2022-07-13T17:05:08Z) - Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a
Random Forest [8.513154770491898]
We argue that tree depth should be seen as a natural form of regularization across the entire procedure.
In particular, our work suggests that random forests with shallow trees are advantageous when the signal-to-noise ratio in the data is low.
arXiv Detail & Related papers (2021-03-30T21:57:55Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - Censored Quantile Regression Forest [81.9098291337097]
We develop a new estimating equation that adapts to censoring and leads to quantile score whenever the data do not exhibit censoring.
The proposed procedure named it censored quantile regression forest, allows us to estimate quantiles of time-to-event without any parametric modeling assumption.
arXiv Detail & Related papers (2020-01-08T23:20:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.