Heterogeneous Oblique Double Random Forest
- URL: http://arxiv.org/abs/2304.06788v1
- Date: Thu, 13 Apr 2023 19:14:23 GMT
- Title: Heterogeneous Oblique Double Random Forest
- Authors: M.A. Ganaie and M. Tanveer and I. Beheshti and N. Ahmad and P.N.
Suganthan
- Abstract summary: The performance of oblique decision trees depends on the way oblique hyperplanes are generate and the data used for the generation of those hyperplanes.
The proposed model employs several linear classifiers at each non-leaf node on the bootstrapped data and splits the original data based on the optimal linear classifier.
The experimental analysis indicates that the performance of the introduced heterogeneous double random forest is comparatively better than the baseline models.
- Score: 1.2599533416395767
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The decision tree ensembles use a single data feature at each node for
splitting the data. However, splitting in this manner may fail to capture the
geometric properties of the data. Thus, oblique decision trees generate the
oblique hyperplane for splitting the data at each non-leaf node. Oblique
decision trees capture the geometric properties of the data and hence, show
better generalization. The performance of the oblique decision trees depends on
the way oblique hyperplanes are generate and the data used for the generation
of those hyperplanes. Recently, multiple classifiers have been used in a
heterogeneous random forest (RaF) classifier, however, it fails to generate the
trees of proper depth. Moreover, double RaF studies highlighted that larger
trees can be generated via bootstrapping the data at each non-leaf node and
splitting the original data instead of the bootstrapped data recently. The
study of heterogeneous RaF lacks the generation of larger trees while as the
double RaF based model fails to take over the geometric characteristics of the
data. To address these shortcomings, we propose heterogeneous oblique double
RaF. The proposed model employs several linear classifiers at each non-leaf
node on the bootstrapped data and splits the original data based on the optimal
linear classifier. The optimal hyperplane corresponds to the models based on
the optimized impurity criterion. The experimental analysis indicates that the
performance of the introduced heterogeneous double random forest is
comparatively better than the baseline models. To demonstrate the effectiveness
of the proposed heterogeneous double random forest, we used it for the
diagnosis of Schizophrenia disease. The proposed model predicted the disease
more accurately compared to the baseline models.
Related papers
- Identifying General Mechanism Shifts in Linear Causal Representations [58.6238439611389]
We consider the linear causal representation learning setting where we observe a linear mixing of $d$ unknown latent factors.
Recent work has shown that it is possible to recover the latent factors as well as the underlying structural causal model over them.
We provide a surprising identifiability result that it is indeed possible, under some very mild standard assumptions, to identify the set of shifted nodes.
arXiv Detail & Related papers (2024-10-31T15:56:50Z) - Heterogeneous Random Forest [2.0646127669654835]
Heterogeneous Random Forest (HRF) is designed to enhance tree diversity in a meaningful way.
HRF consistently outperformed other ensemble methods in terms of accuracy across the majority of datasets.
arXiv Detail & Related papers (2024-10-24T09:18:55Z) - Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method [76.31185707649227]
We propose a Deep Tree-based Retriever (DTR) for efficient recommendation.
DTR frames the training task as a softmax-based multi-class classification over tree nodes at the same level.
To mitigate the suboptimality induced by the labeling of non-leaf nodes, we propose a rectification method for the loss function.
arXiv Detail & Related papers (2024-08-21T05:09:53Z) - Statistical Advantages of Oblique Randomized Decision Trees and Forests [0.0]
Generalization error and convergence rates are obtained for the flexible dimension reduction model class of ridge functions.
A lower bound on the risk of axis-aligned Mondrian trees is obtained proving that these estimators are suboptimal for these linear dimension reduction models.
arXiv Detail & Related papers (2024-07-02T17:35:22Z) - Forecasting with Hyper-Trees [50.72190208487953]
Hyper-Trees are designed to learn the parameters of time series models.
By relating the parameters of a target time series model to features, Hyper-Trees also address the issue of parameter non-stationarity.
In this novel approach, the trees first generate informative representations from the input features, which a shallow network then maps to the target model parameters.
arXiv Detail & Related papers (2024-05-13T15:22:15Z) - Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection [51.11833609431406]
Homophily distribution differences between different classes are significantly greater than those in homophilic and heterophilic graphs.
We introduce a new metric called Class Homophily Variance, which quantitatively describes this phenomenon.
To mitigate its impact, we propose a novel GNN model named Homophily Edge Generation Graph Neural Network (HedGe)
arXiv Detail & Related papers (2024-03-15T14:26:53Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - A cautionary tale on fitting decision trees to data from additive
models: generalization lower bounds [9.546094657606178]
We study the generalization performance of decision trees with respect to different generative regression models.
This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data.
We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models.
arXiv Detail & Related papers (2021-10-18T21:22:40Z) - Spectral Top-Down Recovery of Latent Tree Models [13.681975313065477]
Spectral Top-Down Recovery (STDR) is a divide-and-conquer approach for inference of large latent tree models.
STDR's partitioning step is non-random. Instead, it is based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes.
We prove that STDR is statistically consistent, and bound the number of samples required to accurately recover the tree with high probability.
arXiv Detail & Related papers (2021-02-26T02:47:42Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Optimal Decision Trees for Nonlinear Metrics [42.18286681448184]
We show a novel algorithm for producing optimal trees for nonlinear metrics.
To the best of our knowledge, this is the first method to compute provably optimal decision trees for nonlinear metrics.
Our approach leads to a trade-off when compared to optimising linear metrics.
arXiv Detail & Related papers (2020-09-15T08:30:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.