Adaptive Split Balancing for Optimal Random Forest
- URL: http://arxiv.org/abs/2402.11228v1
- Date: Sat, 17 Feb 2024 09:10:40 GMT
- Title: Adaptive Split Balancing for Optimal Random Forest
- Authors: Yuqian Zhang, Weijie Ji, Jelena Bradic
- Abstract summary: We introduce the adaptive split balancing forest (ASBF), capable of learning tree representations from data.
We also propose a localized version that attains the minimax rate under the H"older class $mathcalHq,beta$ for any $qinmathbbN$ and $betain(0,1]$)
- Score: 10.021381302215062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While random forests are commonly used for regression problems, existing
methods often lack adaptability in complex situations or lose optimality under
simple, smooth scenarios. In this study, we introduce the adaptive split
balancing forest (ASBF), capable of learning tree representations from data
while simultaneously achieving minimax optimality under the Lipschitz class. To
exploit higher-order smoothness levels, we further propose a localized version
that attains the minimax rate under the H\"older class $\mathcal{H}^{q,\beta}$
for any $q\in\mathbb{N}$ and $\beta\in(0,1]$. Rather than relying on the
widely-used random feature selection, we consider a balanced modification to
existing approaches. Our results indicate that an over-reliance on auxiliary
randomness may compromise the approximation power of tree models, leading to
suboptimal results. Conversely, a less random, more balanced approach
demonstrates optimality. Additionally, we establish uniform upper bounds and
explore the application of random forests in average treatment effect
estimation problems. Through simulation studies and real-data applications, we
demonstrate the superior empirical performance of the proposed methods over
existing random forests.
Related papers
- Smoothed Normalization for Efficient Distributed Private Optimization [54.197255548244705]
Federated learning enables machine learning models with privacy of participants.
There is no differentially private distributed method for training, non-feedback problems.
We introduce a new distributed algorithm $alpha$-$sf NormEC$ with provable convergence guarantees.
arXiv Detail & Related papers (2025-02-19T07:10:32Z) - A Powerful Random Forest Featuring Linear Extensions (RaFFLE) [1.2233362977312945]
RaFFLE is a novel framework that integrates PILOT trees as base learners within a random forest ensemble.
PILOT trees combine the computational efficiency of traditional decision trees with the flexibility of linear model trees.
RaFFLE proves to be a versatile tool for tackling a wide variety of regression problems.
arXiv Detail & Related papers (2025-02-14T14:22:51Z) - Can a Single Tree Outperform an Entire Forest? [5.448070998907116]
The prevailing mindset is that a single decision tree underperforms classic random forests in testing accuracy.
This study challenges such a mindset by significantly improving the testing accuracy of an oblique regression tree.
Our approach reformulates tree training as a differentiable unconstrained optimization task.
arXiv Detail & Related papers (2024-11-26T00:18:18Z) - Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method [76.31185707649227]
We propose a Deep Tree-based Retriever (DTR) for efficient recommendation.
DTR frames the training task as a softmax-based multi-class classification over tree nodes at the same level.
To mitigate the suboptimality induced by the labeling of non-leaf nodes, we propose a rectification method for the loss function.
arXiv Detail & Related papers (2024-08-21T05:09:53Z) - Inference with Mondrian Random Forests [6.97762648094816]
We give precise bias and variance characterizations, along with a Berry-Esseen-type central limit theorem, for the Mondrian random forest regression estimator.
We present valid statistical inference methods for the unknown regression function.
Efficient and implementable algorithms are devised for both batch and online learning settings.
arXiv Detail & Related papers (2023-10-15T01:41:42Z) - bsnsing: A decision tree induction method based on recursive optimal
boolean rule composition [2.28438857884398]
This paper proposes a new mixed-integer programming (MIP) formulation to optimize split rule selection in the decision tree induction process.
It develops an efficient search solver that is able to solve practical instances faster than commercial solvers.
arXiv Detail & Related papers (2022-05-30T17:13:57Z) - On multivariate randomized classification trees: $l_0$-based sparsity,
VC~dimension and decomposition methods [0.9346127431927981]
We investigate the nonlinear continuous optimization formulation proposed in Blanquero et al.
We first consider alternative methods to sparsify such trees based on concave approximations of the $l_0$ norm"
We propose a general decomposition scheme and an efficient version of it. Experiments on larger datasets show that the proposed decomposition method is able to significantly reduce the training times without compromising the accuracy.
arXiv Detail & Related papers (2021-12-09T22:49:08Z) - Minimax Rates for High-Dimensional Random Tessellation Forests [0.0]
Mondrian forests is the first class of random forests for which minimax rates were obtained in arbitrary dimension.
We show that a large class of random forests with general split directions also achieve minimax optimal convergence rates in arbitrary dimension.
arXiv Detail & Related papers (2021-09-22T06:47:38Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - An Efficient Adversarial Attack for Tree Ensembles [91.05779257472675]
adversarial attacks on tree based ensembles such as gradient boosting decision trees (DTs) and random forests (RFs)
We show that our method can be thousands of times faster than the previous mixed-integer linear programming (MILP) based approach.
Our code is available at https://chong-z/tree-ensemble-attack.
arXiv Detail & Related papers (2020-10-22T10:59:49Z) - Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm.
We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z) - Generalized and Scalable Optimal Sparse Decision Trees [56.35541305670828]
We present techniques that produce optimal decision trees over a variety of objectives.
We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables.
arXiv Detail & Related papers (2020-06-15T19:00:11Z) - ENTMOOT: A Framework for Optimization over Ensemble Tree Models [57.98561336670884]
ENTMOOT is a framework for integrating tree models into larger optimization problems.
We show how ENTMOOT allows a simple integration of tree models into decision-making and black-box optimization.
arXiv Detail & Related papers (2020-03-10T14:34:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.