Related papers: When does Subagging Work?

When does Subagging Work?

URL: http://arxiv.org/abs/2404.01832v1
Date: Tue, 2 Apr 2024 10:44:55 GMT
Title: When does Subagging Work?
Authors: Christos Revelas, Otilia Boldea, Bas J. M. Werker,
Abstract summary: We study the effectiveness of subagging, or subsample aggregating, on regression trees. We formalize that (i) the bias depends on the diameter of cells, hence trees with few splits tend to be biased. We compare the performance of subagging to that of trees across different numbers of splits.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the effectiveness of subagging, or subsample aggregating, on regression trees, a popular non-parametric method in machine learning. First, we give sufficient conditions for pointwise consistency of trees. We formalize that (i) the bias depends on the diameter of cells, hence trees with few splits tend to be biased, and (ii) the variance depends on the number of observations in cells, hence trees with many splits tend to have large variance. While these statements for bias and variance are known to hold globally in the covariate space, we show that, under some constraints, they are also true locally. Second, we compare the performance of subagging to that of trees across different numbers of splits. We find that (1) for any given number of splits, subagging improves upon a single tree, and (2) this improvement is larger for many splits than it is for few splits. However, (3) a single tree grown at optimal size can outperform subagging if the size of its individual trees is not optimally chosen. This last result goes against common practice of growing large randomized trees to eliminate bias and then averaging to reduce variance.

Related papers

Diversity Conscious Refined Random Forest [0.0]
Random Forest (RF) is a widely used ensemble learning technique.<n>RF often relies on hundreds of trees and all input features, leading to high cost and model redundancy.<n>We propose a Refined Random Forest that grows trees only on informative features and enforces maximal diversity by clustering and retaining uncorrelated trees.
arXiv Detail & Related papers (2025-07-01T06:28:15Z)
Experiments with Optimal Model Trees [2.8391355909797644]
We show that globally optimal model trees can achieve competitive accuracy with very small trees. We also compare to classic optimal and greedily grown decision trees, random forests, and support vector machines.
arXiv Detail & Related papers (2025-03-17T08:03:47Z)
Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles. We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z)
SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search [68.66904039405871]
We introduce SoftTreeMax, a generalization of softmax that takes planning into account. We show for the first time the role of a tree expansion policy in mitigating this variance. Our differentiable tree-based policy leverages all gradients at the tree leaves in each environment step instead of the traditional single-sample-based gradient.
arXiv Detail & Related papers (2023-01-30T19:03:14Z)
Distributional Adaptive Soft Regression Trees [0.0]
This article proposes a new type of a distributional regression tree using a multivariate soft split rule. One great advantage of the soft split is that smooth high-dimensional functions can be estimated with only one tree. We show by means of extensive simulation studies that the algorithm has excellent properties and outperforms various benchmark methods.
arXiv Detail & Related papers (2022-10-19T08:59:02Z)
Unbiased and Efficient Sampling of Dependency Trees [0.0]
Most treebanks require that every valid dependency tree has a single edge coming out of the ROOT node. Zmigrod et al. have recently proposed algorithms for sampling with and without replacement from the single-root dependency tree distribution. We show that their fastest algorithm for sampling with replacement, Wilson-RC, is in fact biased.
arXiv Detail & Related papers (2022-05-25T09:57:28Z)
Ensembles of Double Random Forest [1.7205106391379026]
We propose two approaches for generating ensembles of double random forest. In the first approach, we propose a rotation based ensemble of double random forest. In the second approach, we propose oblique ensembles of double random forest.
arXiv Detail & Related papers (2021-11-03T04:19:41Z)
Active-LATHE: An Active Learning Algorithm for Boosting the Error Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$. Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z)
Controlling the False Split Rate in Tree-Based Aggregation [11.226095593522691]
We propose a hypothesis testing algorithm for tree-based aggregation. We focus on two main examples of tree-based aggregation, one which involves aggregating means and the other which involves aggregating regression coefficients.
arXiv Detail & Related papers (2021-08-11T17:59:22Z)
SGA: A Robust Algorithm for Partial Recovery of Tree-Structured Graphical Models with Noisy Samples [75.32013242448151]
We consider learning Ising tree models when the observations from the nodes are corrupted by independent but non-identically distributed noise. Katiyar et al. (2020) showed that although the exact tree structure cannot be recovered, one can recover a partial tree structure. We propose Symmetrized Geometric Averaging (SGA), a more statistically robust algorithm for partial tree recovery.
arXiv Detail & Related papers (2021-01-22T01:57:35Z)
Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects. We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions. Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z)
A better method to enforce monotonic constraints in regression and classification trees [0.0]
We present two new ways of enforcing monotone constraints in regression and classification trees. One yields better results than the current LightGBM, and has a similar computation time. The other one yields even better results, but is much slower than the current LightGBM.
arXiv Detail & Related papers (2020-11-02T14:04:21Z)
Convex Polytope Trees [57.56078843831244]
convex polytope trees (CPT) are proposed to expand the family of decision trees by an interpretable generalization of their decision boundary. We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given.
arXiv Detail & Related papers (2020-10-21T19:38:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.