Adaptive Split Balancing for Optimal Random Forest
- URL: http://arxiv.org/abs/2402.11228v1
- Date: Sat, 17 Feb 2024 09:10:40 GMT
- Title: Adaptive Split Balancing for Optimal Random Forest
- Authors: Yuqian Zhang, Weijie Ji, Jelena Bradic
- Abstract summary: We introduce the adaptive split balancing forest (ASBF), capable of learning tree representations from data.
We also propose a localized version that attains the minimax rate under the H"older class $mathcalHq,beta$ for any $qinmathbbN$ and $betain(0,1]$)
- Score: 10.021381302215062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While random forests are commonly used for regression problems, existing
methods often lack adaptability in complex situations or lose optimality under
simple, smooth scenarios. In this study, we introduce the adaptive split
balancing forest (ASBF), capable of learning tree representations from data
while simultaneously achieving minimax optimality under the Lipschitz class. To
exploit higher-order smoothness levels, we further propose a localized version
that attains the minimax rate under the H\"older class $\mathcal{H}^{q,\beta}$
for any $q\in\mathbb{N}$ and $\beta\in(0,1]$. Rather than relying on the
widely-used random feature selection, we consider a balanced modification to
existing approaches. Our results indicate that an over-reliance on auxiliary
randomness may compromise the approximation power of tree models, leading to
suboptimal results. Conversely, a less random, more balanced approach
demonstrates optimality. Additionally, we establish uniform upper bounds and
explore the application of random forests in average treatment effect
estimation problems. Through simulation studies and real-data applications, we
demonstrate the superior empirical performance of the proposed methods over
existing random forests.
Related papers
- Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Inference with Mondrian Random Forests [7.842152902652216]
We give a central limit theorem for the estimates made by a Mondrian random forest in the regression setting.
We also provide a debiasing procedure for Mondrian random forests which allows them to achieve minimax-optimal estimation rates.
arXiv Detail & Related papers (2023-10-15T01:41:42Z) - Distributional Adaptive Soft Regression Trees [0.0]
This article proposes a new type of a distributional regression tree using a multivariate soft split rule.
One great advantage of the soft split is that smooth high-dimensional functions can be estimated with only one tree.
We show by means of extensive simulation studies that the algorithm has excellent properties and outperforms various benchmark methods.
arXiv Detail & Related papers (2022-10-19T08:59:02Z) - On Uncertainty Estimation by Tree-based Surrogate Models in Sequential
Model-based Optimization [13.52611859628841]
We revisit various ensembles of randomized trees to investigate their behavior in the perspective of prediction uncertainty estimation.
We propose a new way of constructing an ensemble of randomized trees, referred to as BwO forest, where bagging with oversampling is employed to construct bootstrapped samples.
Experimental results demonstrate the validity and good performance of BwO forest over existing tree-based models in various circumstances.
arXiv Detail & Related papers (2022-02-22T04:50:37Z) - Minimax Rates for High-Dimensional Random Tessellation Forests [0.0]
Mondrian forests is the first class of random forests for which minimax rates were obtained in arbitrary dimension.
We show that a large class of random forests with general split directions also achieve minimax optimal convergence rates in arbitrary dimension.
arXiv Detail & Related papers (2021-09-22T06:47:38Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm.
We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.