Heterogeneous Random Forest
- URL: http://arxiv.org/abs/2410.19022v1
- Date: Thu, 24 Oct 2024 09:18:55 GMT
- Title: Heterogeneous Random Forest
- Authors: Ye-eun Kim, Seoung Yun Kim, Hyunjoong Kim,
- Abstract summary: Heterogeneous Random Forest (HRF) is designed to enhance tree diversity in a meaningful way.
HRF consistently outperformed other ensemble methods in terms of accuracy across the majority of datasets.
- Score: 2.0646127669654835
- License:
- Abstract: Random forest (RF) stands out as a highly favored machine learning approach for classification problems. The effectiveness of RF hinges on two key factors: the accuracy of individual trees and the diversity among them. In this study, we introduce a novel approach called heterogeneous RF (HRF), designed to enhance tree diversity in a meaningful way. This diversification is achieved by deliberately introducing heterogeneity during the tree construction. Specifically, features used for splitting near the root node of previous trees are assigned lower weights when constructing the feature sub-space of the subsequent trees. As a result, dominant features in the prior trees are less likely to be employed in the next iteration, leading to a more diverse set of splitting features at the nodes. Through simulation studies, it was confirmed that the HRF method effectively mitigates the selection bias of trees within the ensemble, increases the diversity of the ensemble, and demonstrates superior performance on datasets with fewer noise features. To assess the comparative performance of HRF against other widely adopted ensemble methods, we conducted tests on 52 datasets, comprising both real-world and synthetic data. HRF consistently outperformed other ensemble methods in terms of accuracy across the majority of datasets.
Related papers
- Binary Classification: Is Boosting stronger than Bagging? [5.877778007271621]
We introduce Enhanced Random Forests, an extension of vanilla Random Forests with extra functionalities and adaptive sample and model weighting.
We develop an iterative algorithm for adapting the training sample weights, by favoring the hardest examples, and an approach for finding personalized tree weighting schemes for each new sample.
Our method significantly improves upon regular Random Forests across 15 different binary classification datasets and considerably outperforms other tree methods, including XGBoost.
arXiv Detail & Related papers (2024-10-24T23:22:33Z) - Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method [76.31185707649227]
We propose a Deep Tree-based Retriever (DTR) for efficient recommendation.
DTR frames the training task as a softmax-based multi-class classification over tree nodes at the same level.
To mitigate the suboptimality induced by the labeling of non-leaf nodes, we propose a rectification method for the loss function.
arXiv Detail & Related papers (2024-08-21T05:09:53Z) - Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node [49.08777822540483]
Fast feedforward networks (FFFs) exploit the observation that different regions of the input space activate distinct subsets of neurons in wide networks.
We propose the incorporation of load balancing and Master Leaf techniques into the FFF architecture to improve performance and simplify the training process.
arXiv Detail & Related papers (2024-05-27T05:06:24Z) - Why do Random Forests Work? Understanding Tree Ensembles as
Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles.
We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z) - PhyloGFN: Phylogenetic inference with generative flow networks [57.104166650526416]
We introduce the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and phylogenetic inference.
Because GFlowNets are well-suited for sampling complex structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies.
We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets.
arXiv Detail & Related papers (2023-10-12T23:46:08Z) - Heterogeneous Oblique Double Random Forest [1.2599533416395767]
The performance of oblique decision trees depends on the way oblique hyperplanes are generate and the data used for the generation of those hyperplanes.
The proposed model employs several linear classifiers at each non-leaf node on the bootstrapped data and splits the original data based on the optimal linear classifier.
The experimental analysis indicates that the performance of the introduced heterogeneous double random forest is comparatively better than the baseline models.
arXiv Detail & Related papers (2023-04-13T19:14:23Z) - TreeMix: Compositional Constituency-based Data Augmentation for Natural
Language Understanding [56.794981024301094]
We propose a compositional data augmentation approach for natural language understanding called TreeMix.
Specifically, TreeMix leverages constituency parsing tree to decompose sentences into constituent sub-structures and the Mixup data augmentation technique to recombine them to generate new sentences.
Compared with previous approaches, TreeMix introduces greater diversity to the samples generated and encourages models to learn compositionality of NLP data.
arXiv Detail & Related papers (2022-05-12T15:25:12Z) - Hierarchical Shrinkage: improving the accuracy and interpretability of
tree-based methods [10.289846887751079]
We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure.
HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques.
All code and models are released in a full-fledged package available on Github.
arXiv Detail & Related papers (2022-02-02T02:43:23Z) - Crossbreeding in Random Forest [5.8010446129208155]
Ensemble learning methods are designed to benefit from multiple learning algorithms for better predictive performance.
The tradeoff of this improved performance is slower speed and larger size of ensemble learning systems compared to single learning systems.
We present a novel approach to deal with this problem in Random Forest (RF) as one of the most powerful ensemble methods.
arXiv Detail & Related papers (2021-01-21T12:58:54Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - Optimal survival trees ensemble [0.0]
Recent studies have adopted an approach of selecting accurate and diverse trees based on individual or collective performance within an ensemble for classification and regression problems.
This work follows in the wake of these investigations and considers the possibility of growing a forest of optimal survival trees.
In addition to improve predictive performance, the proposed method reduces the number of survival trees in the ensemble as compared to the other tree based methods.
arXiv Detail & Related papers (2020-05-18T19:28:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.