Improved Weighted Random Forest for Classification Problems
- URL: http://arxiv.org/abs/2009.00534v1
- Date: Tue, 1 Sep 2020 16:08:45 GMT
- Title: Improved Weighted Random Forest for Classification Problems
- Authors: Mohsen Shahhosseini, Guiping Hu
- Abstract summary: The key to make well-performing ensemble model is in the diversity of the base models.
We propose several algorithms that intend to modify the weighting strategy of regular random forest.
The proposed models are able to introduce significant improvements compared to regular random forest.
- Score: 3.42658286826597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several studies have shown that combining machine learning models in an
appropriate way will introduce improvements in the individual predictions made
by the base models. The key to make well-performing ensemble model is in the
diversity of the base models. Of the most common solutions for introducing
diversity into the decision trees are bagging and random forest. Bagging
enhances the diversity by sampling with replacement and generating many
training data sets, while random forest adds selecting a random number of
features as well. This has made the random forest a winning candidate for many
machine learning applications. However, assuming equal weights for all base
decision trees does not seem reasonable as the randomization of sampling and
input feature selection may lead to different levels of decision-making
abilities across base decision trees. Therefore, we propose several algorithms
that intend to modify the weighting strategy of regular random forest and
consequently make better predictions. The designed weighting frameworks include
optimal weighted random forest based on ac-curacy, optimal weighted random
forest based on the area under the curve (AUC), performance-based weighted
random forest, and several stacking-based weighted random forest models. The
numerical results show that the proposed models are able to introduce
significant improvements compared to regular random forest.
Related papers
- Binary Classification: Is Boosting stronger than Bagging? [5.877778007271621]
We introduce Enhanced Random Forests, an extension of vanilla Random Forests with extra functionalities and adaptive sample and model weighting.
We develop an iterative algorithm for adapting the training sample weights, by favoring the hardest examples, and an approach for finding personalized tree weighting schemes for each new sample.
Our method significantly improves upon regular Random Forests across 15 different binary classification datasets and considerably outperforms other tree methods, including XGBoost.
arXiv Detail & Related papers (2024-10-24T23:22:33Z) - Scalable Ensemble Diversification for OOD Generalization and Detection [68.8982448081223]
SED identifies hard training samples on the fly and encourages the ensemble members to disagree on these.
We show how to avoid the expensive computations in existing methods of exhaustive pairwise disagreements across models.
For OOD generalization, we observe large benefits from the diversification in multiple settings including output-space (classical) ensembles and weight-space ensembles (model soups)
arXiv Detail & Related papers (2024-09-25T10:30:24Z) - ForensicsForest Family: A Series of Multi-scale Hierarchical Cascade Forests for Detecting GAN-generated Faces [53.739014757621376]
We describe a simple and effective forest-based method set called em ForensicsForest Family to detect GAN-generate faces.
ForenscisForest is a newly proposed Multi-scale Hierarchical Cascade Forest.
Hybrid ForensicsForest integrates the CNN layers into models.
Divide-and-Conquer ForensicsForest can construct a forest model using only a portion of training samplings.
arXiv Detail & Related papers (2023-08-02T06:41:19Z) - Contextual Decision Trees [62.997667081978825]
We propose a multi-armed contextual bandit recommendation framework for feature-based selection of a single shallow tree of the learned ensemble.
The trained system, which works on top of the Random Forest, dynamically identifies a base predictor that is responsible for providing the final output.
arXiv Detail & Related papers (2022-07-13T17:05:08Z) - An Approximation Method for Fitted Random Forests [0.0]
We study methods that approximate each fitted tree in the Random Forests model using the multinomial allocation of the data points to the leafs.
Specifically, we begin by studying whether fitting a multinomial logistic regression helps reduce the size while preserving the prediction quality.
arXiv Detail & Related papers (2022-07-05T17:28:52Z) - On Uncertainty Estimation by Tree-based Surrogate Models in Sequential
Model-based Optimization [13.52611859628841]
We revisit various ensembles of randomized trees to investigate their behavior in the perspective of prediction uncertainty estimation.
We propose a new way of constructing an ensemble of randomized trees, referred to as BwO forest, where bagging with oversampling is employed to construct bootstrapped samples.
Experimental results demonstrate the validity and good performance of BwO forest over existing tree-based models in various circumstances.
arXiv Detail & Related papers (2022-02-22T04:50:37Z) - Ensembles of Double Random Forest [1.7205106391379026]
We propose two approaches for generating ensembles of double random forest.
In the first approach, we propose a rotation based ensemble of double random forest.
In the second approach, we propose oblique ensembles of double random forest.
arXiv Detail & Related papers (2021-11-03T04:19:41Z) - Minimax Rates for High-Dimensional Random Tessellation Forests [0.0]
Mondrian forests is the first class of random forests for which minimax rates were obtained in arbitrary dimension.
We show that a large class of random forests with general split directions also achieve minimax optimal convergence rates in arbitrary dimension.
arXiv Detail & Related papers (2021-09-22T06:47:38Z) - Making CNNs Interpretable by Building Dynamic Sequential Decision
Forests with Top-down Hierarchy Learning [62.82046926149371]
We propose a generic model transfer scheme to make Convlutional Neural Networks (CNNs) interpretable.
We achieve this by building a differentiable decision forest on top of CNNs.
We name the transferred model deep Dynamic Sequential Decision Forest (dDSDF)
arXiv Detail & Related papers (2021-06-05T07:41:18Z) - Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm.
We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z) - Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions.
Motivated by these theoretical results, we propose learning several approximate proposals for the best model.
In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.