Related papers: Improving the Accuracy and Interpretability of Random Forests via Forest Pruning

Improving the Accuracy and Interpretability of Random Forests via Forest Pruning

URL: http://arxiv.org/abs/2401.05535v2
Date: Wed, 24 Jan 2024 02:58:54 GMT
Title: Improving the Accuracy and Interpretability of Random Forests via Forest Pruning
Authors: Albert Dorador
Abstract summary: We propose a post-hoc approach that aims to have the best of both worlds: the accuracy of random forests and the interpretability of decision trees. We present two forest-pruning methods to find an optimal sub-forest within a given random forest, and then, when applicable, combine the selected trees into one.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Decades after their inception, random forests continue to provide state-of-the-art accuracy in a variety of learning problems, outperforming in this respect alternative machine learning algorithms such as decision trees or even neural networks. However, being an ensemble method, the one aspect where random forests tend to severely underperform decision trees is interpretability. In the present work, we propose a post-hoc approach that aims to have the best of both worlds: the accuracy of random forests and the interpretability of decision trees. To this end, we present two forest-pruning methods to find an optimal sub-forest within a given random forest, and then, when applicable, combine the selected trees into one. Our first method relies on constrained exhaustive search, while our second method is based on an adaptation of the LASSO methodology. Extensive experiments over synthetic and real world datasets show that, in the majority of scenarios, at least one of the two methods proposed is more accurate than the original random forest, while just using a small fraction of the trees, aiding result interpretability. Compared to current state-of-the-art forest pruning methods, namely sequential forward selection and (a variation of) sequential backward selection, our methods tend to outperform both of them, whether in terms of accuracy, number of trees employed, or both.

Related papers

Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles. We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z)
A Mathematical Programming Approach to Optimal Classification Forests [1.0705399532413618]
We propose a novel mathematical optimization-based methodology in which a given number of trees are simultaneously constructed. The classification rule is derived by assigning to each observation its most frequently predicted class among the trees in the forest. We show that our proposed method has equal or superior performance compared with state-of-the-art tree-based classification methods.
arXiv Detail & Related papers (2022-11-18T20:33:08Z)
Contextual Decision Trees [62.997667081978825]
We propose a multi-armed contextual bandit recommendation framework for feature-based selection of a single shallow tree of the learned ensemble. The trained system, which works on top of the Random Forest, dynamically identifies a base predictor that is responsible for providing the final output.
arXiv Detail & Related papers (2022-07-13T17:05:08Z)
Explaining random forest prediction through diverse rulesets [0.0]
Local Tree eXtractor (LTreeX) is able to explain the forest prediction for a given test instance with a few diverse rules. We show that our proposed approach substantially outperforms other explainable methods in terms of predictive performance.
arXiv Detail & Related papers (2022-03-29T12:54:57Z)
Ensembles of Double Random Forest [1.7205106391379026]
We propose two approaches for generating ensembles of double random forest. In the first approach, we propose a rotation based ensemble of double random forest. In the second approach, we propose oblique ensembles of double random forest.
arXiv Detail & Related papers (2021-11-03T04:19:41Z)
Minimax Rates for High-Dimensional Random Tessellation Forests [0.0]
Mondrian forests is the first class of random forests for which minimax rates were obtained in arbitrary dimension. We show that a large class of random forests with general split directions also achieve minimax optimal convergence rates in arbitrary dimension.
arXiv Detail & Related papers (2021-09-22T06:47:38Z)
Making CNNs Interpretable by Building Dynamic Sequential Decision Forests with Top-down Hierarchy Learning [62.82046926149371]
We propose a generic model transfer scheme to make Convlutional Neural Networks (CNNs) interpretable. We achieve this by building a differentiable decision forest on top of CNNs. We name the transferred model deep Dynamic Sequential Decision Forest (dDSDF)
arXiv Detail & Related papers (2021-06-05T07:41:18Z)
Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a Random Forest [8.513154770491898]
We argue that tree depth should be seen as a natural form of regularization across the entire procedure. In particular, our work suggests that random forests with shallow trees are advantageous when the signal-to-noise ratio in the data is low.
arXiv Detail & Related papers (2021-03-30T21:57:55Z)
Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects. We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions. Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z)
Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm. We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z)
MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.