Optimal trees selection for classification via out-of-bag assessment and
sub-bagging
- URL: http://arxiv.org/abs/2012.15301v1
- Date: Wed, 30 Dec 2020 19:44:11 GMT
- Title: Optimal trees selection for classification via out-of-bag assessment and
sub-bagging
- Authors: Zardad Khan, Naz Gul, Nosheen Faiz, Asma Gul, Werner Adler, Berthold
Lausen
- Abstract summary: The predictive performance of tree based machine learning methods, in general, improves with a decreasing rate as the size of training data increases.
We investigate this in optimal trees ensemble (OTE) where the method fails to learn from some of the training observations due to internal validation.
Modified tree selection methods are thus proposed for OTE to cater for the loss of training observations in internal validation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The effect of training data size on machine learning methods has been well
investigated over the past two decades. The predictive performance of tree
based machine learning methods, in general, improves with a decreasing rate as
the size of training data increases. We investigate this in optimal trees
ensemble (OTE) where the method fails to learn from some of the training
observations due to internal validation. Modified tree selection methods are
thus proposed for OTE to cater for the loss of training observations in
internal validation. In the first method, corresponding out-of-bag (OOB)
observations are used in both individual and collective performance assessment
for each tree. Trees are ranked based on their individual performance on the
OOB observations. A certain number of top ranked trees is selected and starting
from the most accurate tree, subsequent trees are added one by one and their
impact is recorded by using the OOB observations left out from the bootstrap
sample taken for the tree being added. A tree is selected if it improves
predictive accuracy of the ensemble. In the second approach, trees are grown on
random subsets, taken without replacement-known as sub-bagging, of the training
data instead of bootstrap samples (taken with replacement). The remaining
observations from each sample are used in both individual and collective
assessments for each corresponding tree similar to the first method. Analysis
on 21 benchmark datasets and simulations studies show improved performance of
the modified methods in comparison to OTE and other state-of-the-art methods.
Related papers
- Can a Single Tree Outperform an Entire Forest? [5.448070998907116]
The prevailing mindset is that a single decision tree underperforms classic random forests in testing accuracy.
This study challenges such a mindset by significantly improving the testing accuracy of an oblique regression tree.
Our approach reformulates tree training as a differentiable unconstrained optimization task.
arXiv Detail & Related papers (2024-11-26T00:18:18Z) - Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method [76.31185707649227]
We propose a Deep Tree-based Retriever (DTR) for efficient recommendation.
DTR frames the training task as a softmax-based multi-class classification over tree nodes at the same level.
To mitigate the suboptimality induced by the labeling of non-leaf nodes, we propose a rectification method for the loss function.
arXiv Detail & Related papers (2024-08-21T05:09:53Z) - Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees.
We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - Distribution and volume based scoring for Isolation Forests [0.0]
We make two contributions to the Isolation Forest method for anomaly and outlier detection.
The first is an information-theoretically motivated generalisation of the score function that is used to aggregate the scores across random tree estimators.
The second is an alternative scoring function at the level of the individual tree estimator, in which we replace the depth-based scoring of the Isolation Forest with one based on hyper-volumes associated to an isolation tree's leaf nodes.
arXiv Detail & Related papers (2023-09-20T16:27:10Z) - Prediction Algorithms Achieving Bayesian Decision Theoretical Optimality
Based on Decision Trees as Data Observation Processes [1.2774526936067927]
This paper uses trees to represent data observation processes behind given data.
We derive the statistically optimal prediction, which is robust against overfitting.
We solve this by a Markov chain Monte Carlo method, whose step size is adaptively tuned according to a posterior distribution for the trees.
arXiv Detail & Related papers (2023-06-12T12:14:57Z) - RLET: A Reinforcement Learning Based Approach for Explainable QA with
Entailment Trees [47.745218107037786]
We propose RLET, a Reinforcement Learning based Entailment Tree generation framework.
RLET iteratively performs single step reasoning with sentence selection and deduction generation modules.
Experiments on three settings of the EntailmentBank dataset demonstrate the strength of using RL framework.
arXiv Detail & Related papers (2022-10-31T06:45:05Z) - Social Interpretable Tree for Pedestrian Trajectory Prediction [75.81745697967608]
We propose a tree-based method, termed as Social Interpretable Tree (SIT), to address this multi-modal prediction task.
A path in the tree from the root to leaf represents an individual possible future trajectory.
Despite the hand-crafted tree, the experimental results on ETH-UCY and Stanford Drone datasets demonstrate that our method is capable of matching or exceeding the performance of state-of-the-art methods.
arXiv Detail & Related papers (2022-05-26T12:18:44Z) - Hierarchical Shrinkage: improving the accuracy and interpretability of
tree-based methods [10.289846887751079]
We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure.
HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques.
All code and models are released in a full-fledged package available on Github.
arXiv Detail & Related papers (2022-02-02T02:43:23Z) - Visualizing hierarchies in scRNA-seq data using a density tree-biased
autoencoder [50.591267188664666]
We propose an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data.
We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space.
arXiv Detail & Related papers (2021-02-11T08:48:48Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Optimal survival trees ensemble [0.0]
Recent studies have adopted an approach of selecting accurate and diverse trees based on individual or collective performance within an ensemble for classification and regression problems.
This work follows in the wake of these investigations and considers the possibility of growing a forest of optimal survival trees.
In addition to improve predictive performance, the proposed method reduces the number of survival trees in the ensemble as compared to the other tree based methods.
arXiv Detail & Related papers (2020-05-18T19:28:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.