Optimal trees selection for classification via out-of-bag assessment and
sub-bagging
- URL: http://arxiv.org/abs/2012.15301v1
- Date: Wed, 30 Dec 2020 19:44:11 GMT
- Title: Optimal trees selection for classification via out-of-bag assessment and
sub-bagging
- Authors: Zardad Khan, Naz Gul, Nosheen Faiz, Asma Gul, Werner Adler, Berthold
Lausen
- Abstract summary: The predictive performance of tree based machine learning methods, in general, improves with a decreasing rate as the size of training data increases.
We investigate this in optimal trees ensemble (OTE) where the method fails to learn from some of the training observations due to internal validation.
Modified tree selection methods are thus proposed for OTE to cater for the loss of training observations in internal validation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The effect of training data size on machine learning methods has been well
investigated over the past two decades. The predictive performance of tree
based machine learning methods, in general, improves with a decreasing rate as
the size of training data increases. We investigate this in optimal trees
ensemble (OTE) where the method fails to learn from some of the training
observations due to internal validation. Modified tree selection methods are
thus proposed for OTE to cater for the loss of training observations in
internal validation. In the first method, corresponding out-of-bag (OOB)
observations are used in both individual and collective performance assessment
for each tree. Trees are ranked based on their individual performance on the
OOB observations. A certain number of top ranked trees is selected and starting
from the most accurate tree, subsequent trees are added one by one and their
impact is recorded by using the OOB observations left out from the bootstrap
sample taken for the tree being added. A tree is selected if it improves
predictive accuracy of the ensemble. In the second approach, trees are grown on
random subsets, taken without replacement-known as sub-bagging, of the training
data instead of bootstrap samples (taken with replacement). The remaining
observations from each sample are used in both individual and collective
assessments for each corresponding tree similar to the first method. Analysis
on 21 benchmark datasets and simulations studies show improved performance of
the modified methods in comparison to OTE and other state-of-the-art methods.
Related papers
- Learning a Decision Tree Algorithm with Transformers [80.49817544396379]
We introduce MetaTree, which trains a transformer-based model on filtered outputs from classical algorithms to produce strong decision trees for classification.
We then train MetaTree to produce the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - Distribution and volume based scoring for Isolation Forests [0.0]
We make two contributions to the Isolation Forest method for anomaly and outlier detection.
The first is an information-theoretically motivated generalisation of the score function that is used to aggregate the scores across random tree estimators.
The second is an alternative scoring function at the level of the individual tree estimator, in which we replace the depth-based scoring of the Isolation Forest with one based on hyper-volumes associated to an isolation tree's leaf nodes.
arXiv Detail & Related papers (2023-09-20T16:27:10Z) - Prediction Algorithms Achieving Bayesian Decision Theoretical Optimality
Based on Decision Trees as Data Observation Processes [1.2774526936067927]
This paper uses trees to represent data observation processes behind given data.
We derive the statistically optimal prediction, which is robust against overfitting.
We solve this by a Markov chain Monte Carlo method, whose step size is adaptively tuned according to a posterior distribution for the trees.
arXiv Detail & Related papers (2023-06-12T12:14:57Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - SETAR-Tree: A Novel and Accurate Tree Algorithm for Global Time Series
Forecasting [7.206754802573034]
In this paper, we explore the close connections between TAR models and regression trees.
We introduce a new forecasting-specific tree algorithm that trains global Pooled Regression (PR) models in the leaves.
In our evaluation, the proposed tree and forest models are able to achieve significantly higher accuracy than a set of state-of-the-art tree-based algorithms.
arXiv Detail & Related papers (2022-11-16T04:30:42Z) - RLET: A Reinforcement Learning Based Approach for Explainable QA with
Entailment Trees [47.745218107037786]
We propose RLET, a Reinforcement Learning based Entailment Tree generation framework.
RLET iteratively performs single step reasoning with sentence selection and deduction generation modules.
Experiments on three settings of the EntailmentBank dataset demonstrate the strength of using RL framework.
arXiv Detail & Related papers (2022-10-31T06:45:05Z) - Social Interpretable Tree for Pedestrian Trajectory Prediction [75.81745697967608]
We propose a tree-based method, termed as Social Interpretable Tree (SIT), to address this multi-modal prediction task.
A path in the tree from the root to leaf represents an individual possible future trajectory.
Despite the hand-crafted tree, the experimental results on ETH-UCY and Stanford Drone datasets demonstrate that our method is capable of matching or exceeding the performance of state-of-the-art methods.
arXiv Detail & Related papers (2022-05-26T12:18:44Z) - Hierarchical Shrinkage: improving the accuracy and interpretability of
tree-based methods [10.289846887751079]
We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure.
HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques.
All code and models are released in a full-fledged package available on Github.
arXiv Detail & Related papers (2022-02-02T02:43:23Z) - Visualizing hierarchies in scRNA-seq data using a density tree-biased
autoencoder [50.591267188664666]
We propose an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data.
We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space.
arXiv Detail & Related papers (2021-02-11T08:48:48Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Optimal survival trees ensemble [0.0]
Recent studies have adopted an approach of selecting accurate and diverse trees based on individual or collective performance within an ensemble for classification and regression problems.
This work follows in the wake of these investigations and considers the possibility of growing a forest of optimal survival trees.
In addition to improve predictive performance, the proposed method reduces the number of survival trees in the ensemble as compared to the other tree based methods.
arXiv Detail & Related papers (2020-05-18T19:28:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.