Diversity Conscious Refined Random Forest
- URL: http://arxiv.org/abs/2507.00467v2
- Date: Sat, 05 Jul 2025 18:30:34 GMT
- Title: Diversity Conscious Refined Random Forest
- Authors: Sijan Bhattarai, Saurav Bhandari, Girija Bhusal, Saroj Shakya, Tapendra Pandey,
- Abstract summary: Random Forest (RF) is a widely used ensemble learning technique.<n>RF often relies on hundreds of trees and all input features, leading to high cost and model redundancy.<n>We propose a Refined Random Forest that grows trees only on informative features and enforces maximal diversity by clustering and retaining uncorrelated trees.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random Forest (RF) is a widely used ensemble learning technique known for its robust classification performance across diverse domains. However, it often relies on hundreds of trees and all input features, leading to high inference cost and model redundancy. In this work, our goal is to grow trees dynamically only on informative features and then enforce maximal diversity by clustering and retaining uncorrelated trees. Therefore, we propose a Refined Random Forest Classifier that iteratively refines itself by first removing the least informative features and then analytically determines how many new trees should be grown, followed by correlation-based clustering to remove redundant trees. The classification accuracy of our model was compared against the standard RF on the same number of trees. Experiments on 8 multiple benchmark datasets, including binary and multiclass datasets, demonstrate that the proposed model achieves improved accuracy compared to standard RF.
Related papers
- Experiments with Optimal Model Trees [2.8391355909797644]
We show that globally optimal model trees can achieve competitive accuracy with very small trees.<n>We also compare to classic optimal and greedily grown decision trees, random forests, and support vector machines.
arXiv Detail & Related papers (2025-03-17T08:03:47Z) - Binary Classification: Is Boosting stronger than Bagging? [5.877778007271621]
We introduce Enhanced Random Forests, an extension of vanilla Random Forests with extra functionalities and adaptive sample and model weighting.
We develop an iterative algorithm for adapting the training sample weights, by favoring the hardest examples, and an approach for finding personalized tree weighting schemes for each new sample.
Our method significantly improves upon regular Random Forests across 15 different binary classification datasets and considerably outperforms other tree methods, including XGBoost.
arXiv Detail & Related papers (2024-10-24T23:22:33Z) - Heterogeneous Random Forest [2.0646127669654835]
Heterogeneous Random Forest (HRF) is designed to enhance tree diversity in a meaningful way.
HRF consistently outperformed other ensemble methods in terms of accuracy across the majority of datasets.
arXiv Detail & Related papers (2024-10-24T09:18:55Z) - Hierarchical Shrinkage: improving the accuracy and interpretability of
tree-based methods [10.289846887751079]
We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure.
HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques.
All code and models are released in a full-fledged package available on Github.
arXiv Detail & Related papers (2022-02-02T02:43:23Z) - To Boost or not to Boost: On the Limits of Boosted Neural Networks [67.67776094785363]
Boosting is a method for learning an ensemble of classifiers.
While boosting has been shown to be very effective for decision trees, its impact on neural networks has not been extensively studied.
We find that a single neural network usually generalizes better than a boosted ensemble of smaller neural networks with the same total number of parameters.
arXiv Detail & Related papers (2021-07-28T19:10:03Z) - Making CNNs Interpretable by Building Dynamic Sequential Decision
Forests with Top-down Hierarchy Learning [62.82046926149371]
We propose a generic model transfer scheme to make Convlutional Neural Networks (CNNs) interpretable.
We achieve this by building a differentiable decision forest on top of CNNs.
We name the transferred model deep Dynamic Sequential Decision Forest (dDSDF)
arXiv Detail & Related papers (2021-06-05T07:41:18Z) - Spectral Top-Down Recovery of Latent Tree Models [13.681975313065477]
Spectral Top-Down Recovery (STDR) is a divide-and-conquer approach for inference of large latent tree models.
STDR's partitioning step is non-random. Instead, it is based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes.
We prove that STDR is statistically consistent, and bound the number of samples required to accurately recover the tree with high probability.
arXiv Detail & Related papers (2021-02-26T02:47:42Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - An Efficient Adversarial Attack for Tree Ensembles [91.05779257472675]
adversarial attacks on tree based ensembles such as gradient boosting decision trees (DTs) and random forests (RFs)
We show that our method can be thousands of times faster than the previous mixed-integer linear programming (MILP) based approach.
Our code is available at https://chong-z/tree-ensemble-attack.
arXiv Detail & Related papers (2020-10-22T10:59:49Z) - Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance
Segmentation [75.93960390191262]
We exploit prior knowledge of the relations among object categories to cluster fine-grained classes into coarser parent classes.
We propose a simple yet effective resampling method, NMS Resampling, to re-balance the data distribution.
Our method, termed as Forest R-CNN, can serve as a plug-and-play module being applied to most object recognition models.
arXiv Detail & Related papers (2020-08-13T03:52:37Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.