Bayesian post-hoc regularization of random forests
- URL: http://arxiv.org/abs/2306.03702v1
- Date: Tue, 6 Jun 2023 14:15:29 GMT
- Title: Bayesian post-hoc regularization of random forests
- Authors: Bastian Pfeifer
- Abstract summary: Random Forests are powerful ensemble learning algorithms widely used in various machine learning tasks.
We propose post-hoc regularization to leverage the reliable patterns captured by leaf nodes closer to the root.
We have evaluated the performance of our method on various machine learning data sets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Random Forests are powerful ensemble learning algorithms widely used in
various machine learning tasks. However, they have a tendency to overfit noisy
or irrelevant features, which can result in decreased generalization
performance. Post-hoc regularization techniques aim to mitigate this issue by
modifying the structure of the learned ensemble after its training. Here, we
propose Bayesian post-hoc regularization to leverage the reliable patterns
captured by leaf nodes closer to the root, while potentially reducing the
impact of more specific and potentially noisy leaf nodes deeper in the tree.
This approach allows for a form of pruning that does not alter the general
structure of the trees but rather adjusts the influence of leaf nodes based on
their proximity to the root node. We have evaluated the performance of our
method on various machine learning data sets. Our approach demonstrates
competitive performance with the state-of-the-art methods and, in certain
cases, surpasses them in terms of predictive accuracy and generalization.
Related papers
- Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees.
We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - Why do Random Forests Work? Understanding Tree Ensembles as
Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles.
We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z) - Conceptual Views on Tree Ensemble Classifiers [0.0]
Random Forests and related tree-based methods are popular for supervised learning from table based data.
apart from their ease of parallelization, their classification performance is also superior.
Statistical methods are often used to compensate for this disadvantage. Yet, their ability for local explanations, and in particular for global explanations, is limited.
arXiv Detail & Related papers (2023-02-10T14:33:21Z) - A Robust Hypothesis Test for Tree Ensemble Pruning [2.4923006485141284]
We develop and present a novel theoretically justified hypothesis test of split quality for gradient boosted tree ensembles.
We show that using this method instead of the common penalty terms leads to a significant reduction in out of sample loss.
We also present several innovative extensions to the method, opening the door for a wide variety of novel tree pruning algorithms.
arXiv Detail & Related papers (2023-01-24T16:31:49Z) - ForestPrune: Compact Depth-Controlled Tree Ensembles [7.538482310185135]
We present ForestPrune, a novel framework to post-process tree ensembles by pruning depth layers from individual trees.
We develop a specialized optimization algorithm to efficiently obtain high-quality solutions to problems under ForestPrune.
Our experiments demonstrate that ForestPrune produces parsimonious models that outperform models extracted by existing post-processing algorithms.
arXiv Detail & Related papers (2022-05-31T22:04:18Z) - Hierarchical Shrinkage: improving the accuracy and interpretability of
tree-based methods [10.289846887751079]
We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure.
HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques.
All code and models are released in a full-fledged package available on Github.
arXiv Detail & Related papers (2022-02-02T02:43:23Z) - Optimal randomized classification trees [0.0]
Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning.
CARTs are built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and the associated threshold.
This greedy approach trains trees very fast, but, by its nature, their classification accuracy may not be competitive against other state-of-the-art procedures.
arXiv Detail & Related papers (2021-10-19T11:41:12Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring.
Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains.
The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z) - An Efficient Adversarial Attack for Tree Ensembles [91.05779257472675]
adversarial attacks on tree based ensembles such as gradient boosting decision trees (DTs) and random forests (RFs)
We show that our method can be thousands of times faster than the previous mixed-integer linear programming (MILP) based approach.
Our code is available at https://chong-z/tree-ensemble-attack.
arXiv Detail & Related papers (2020-10-22T10:59:49Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.