Related papers: Heterogeneous Oblique Double Random Forest

Heterogeneous Oblique Double Random Forest

URL: http://arxiv.org/abs/2304.06788v1
Date: Thu, 13 Apr 2023 19:14:23 GMT
Title: Heterogeneous Oblique Double Random Forest
Authors: M.A. Ganaie and M. Tanveer and I. Beheshti and N. Ahmad and P.N. Suganthan
Abstract summary: The performance of oblique decision trees depends on the way oblique hyperplanes are generate and the data used for the generation of those hyperplanes. The proposed model employs several linear classifiers at each non-leaf node on the bootstrapped data and splits the original data based on the optimal linear classifier. The experimental analysis indicates that the performance of the introduced heterogeneous double random forest is comparatively better than the baseline models.
Score: 1.2599533416395767
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The decision tree ensembles use a single data feature at each node for splitting the data. However, splitting in this manner may fail to capture the geometric properties of the data. Thus, oblique decision trees generate the oblique hyperplane for splitting the data at each non-leaf node. Oblique decision trees capture the geometric properties of the data and hence, show better generalization. The performance of the oblique decision trees depends on the way oblique hyperplanes are generate and the data used for the generation of those hyperplanes. Recently, multiple classifiers have been used in a heterogeneous random forest (RaF) classifier, however, it fails to generate the trees of proper depth. Moreover, double RaF studies highlighted that larger trees can be generated via bootstrapping the data at each non-leaf node and splitting the original data instead of the bootstrapped data recently. The study of heterogeneous RaF lacks the generation of larger trees while as the double RaF based model fails to take over the geometric characteristics of the data. To address these shortcomings, we propose heterogeneous oblique double RaF. The proposed model employs several linear classifiers at each non-leaf node on the bootstrapped data and splits the original data based on the optimal linear classifier. The optimal hyperplane corresponds to the models based on the optimized impurity criterion. The experimental analysis indicates that the performance of the introduced heterogeneous double random forest is comparatively better than the baseline models. To demonstrate the effectiveness of the proposed heterogeneous double random forest, we used it for the diagnosis of Schizophrenia disease. The proposed model predicted the disease more accurately compared to the baseline models.

Related papers

Learning Decision Trees as Amortized Structure Inference [59.65621207449269]
We propose a hybrid amortized structure inference approach to learn predictive decision tree ensembles given data. We show that our approach, DT-GFN, outperforms state-of-the-art decision tree and deep learning methods on standard classification benchmarks.
arXiv Detail & Related papers (2025-03-10T07:05:07Z)
A Powerful Random Forest Featuring Linear Extensions (RaFFLE) [1.2233362977312945]
RaFFLE is a novel framework that integrates PILOT trees as base learners within a random forest ensemble. PILOT trees combine the computational efficiency of traditional decision trees with the flexibility of linear model trees. RaFFLE proves to be a versatile tool for tackling a wide variety of regression problems.
arXiv Detail & Related papers (2025-02-14T14:22:51Z)
Identifying General Mechanism Shifts in Linear Causal Representations [58.6238439611389]
We consider the linear causal representation learning setting where we observe a linear mixing of $d$ unknown latent factors. Recent work has shown that it is possible to recover the latent factors as well as the underlying structural causal model over them. We provide a surprising identifiability result that it is indeed possible, under some very mild standard assumptions, to identify the set of shifted nodes.
arXiv Detail & Related papers (2024-10-31T15:56:50Z)
Heterogeneous Random Forest [2.0646127669654835]
Heterogeneous Random Forest (HRF) is designed to enhance tree diversity in a meaningful way. HRF consistently outperformed other ensemble methods in terms of accuracy across the majority of datasets.
arXiv Detail & Related papers (2024-10-24T09:18:55Z)
Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method [76.31185707649227]
We propose a Deep Tree-based Retriever (DTR) for efficient recommendation. DTR frames the training task as a softmax-based multi-class classification over tree nodes at the same level. To mitigate the suboptimality induced by the labeling of non-leaf nodes, we propose a rectification method for the loss function.
arXiv Detail & Related papers (2024-08-21T05:09:53Z)
Statistical Advantages of Oblique Randomized Decision Trees and Forests [0.0]
Generalization error and convergence rates are obtained for the flexible dimension reduction model class of ridge functions. A lower bound on the risk of axis-aligned Mondrian trees is obtained proving that these estimators are suboptimal for these linear dimension reduction models.
arXiv Detail & Related papers (2024-07-02T17:35:22Z)
Forecasting with Hyper-Trees [50.72190208487953]
Hyper-Trees are designed to learn the parameters of time series models. By relating the parameters of a target time series model to features, Hyper-Trees also address the issue of parameter non-stationarity. In this novel approach, the trees first generate informative representations from the input features, which a shallow network then maps to the target model parameters.
arXiv Detail & Related papers (2024-05-13T15:22:15Z)
Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection [51.11833609431406]
Homophily distribution differences between different classes are significantly greater than those in homophilic and heterophilic graphs. We introduce a new metric called Class Homophily Variance, which quantitatively describes this phenomenon. To mitigate its impact, we propose a novel GNN model named Homophily Edge Generation Graph Neural Network (HedGe)
arXiv Detail & Related papers (2024-03-15T14:26:53Z)
Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z)
A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds [9.546094657606178]
We study the generalization performance of decision trees with respect to different generative regression models. This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data. We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models.
arXiv Detail & Related papers (2021-10-18T21:22:40Z)
Spectral Top-Down Recovery of Latent Tree Models [13.681975313065477]
Spectral Top-Down Recovery (STDR) is a divide-and-conquer approach for inference of large latent tree models. STDR's partitioning step is non-random. Instead, it is based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes. We prove that STDR is statistically consistent, and bound the number of samples required to accurately recover the tree with high probability.
arXiv Detail & Related papers (2021-02-26T02:47:42Z)
Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization. We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)
Optimal Decision Trees for Nonlinear Metrics [42.18286681448184]
We show a novel algorithm for producing optimal trees for nonlinear metrics. To the best of our knowledge, this is the first method to compute provably optimal decision trees for nonlinear metrics. Our approach leads to a trade-off when compared to optimising linear metrics.
arXiv Detail & Related papers (2020-09-15T08:30:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.