Minimax Rates for High-Dimensional Random Tessellation Forests
- URL: http://arxiv.org/abs/2109.10541v5
- Date: Sun, 29 Oct 2023 16:31:14 GMT
- Title: Minimax Rates for High-Dimensional Random Tessellation Forests
- Authors: Eliza O'Reilly and Ngoc Mai Tran
- Abstract summary: Mondrian forests is the first class of random forests for which minimax rates were obtained in arbitrary dimension.
We show that a large class of random forests with general split directions also achieve minimax optimal convergence rates in arbitrary dimension.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random forests are a popular class of algorithms used for regression and
classification. The algorithm introduced by Breiman in 2001 and many of its
variants are ensembles of randomized decision trees built from axis-aligned
partitions of the feature space. One such variant, called Mondrian forests, was
proposed to handle the online setting and is the first class of random forests
for which minimax rates were obtained in arbitrary dimension. However, the
restriction to axis-aligned splits fails to capture dependencies between
features, and random forests that use oblique splits have shown improved
empirical performance for many tasks. In this work, we show that a large class
of random forests with general split directions also achieve minimax optimal
convergence rates in arbitrary dimension. This class includes STIT forests, a
generalization of Mondrian forests to arbitrary split directions, as well as
random forests derived from Poisson hyperplane tessellations. These are the
first results showing that random forest variants with oblique splits can
obtain minimax optimality in arbitrary dimension. Our proof technique relies on
the novel application of the theory of stationary random tessellations in
stochastic geometry to statistical learning theory.
Related papers
- Adaptive Split Balancing for Optimal Random Forest [8.916614661563893]
We propose a new random forest algorithm that constructs the trees using a novel adaptive split-balancing method.
Our method achieves optimality in simple, smooth scenarios while adaptively learning the tree structure from the data.
arXiv Detail & Related papers (2024-02-17T09:10:40Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Inference with Mondrian Random Forests [6.97762648094816]
We give precise bias and variance characterizations, along with a Berry-Esseen-type central limit theorem, for the Mondrian random forest regression estimator.
We present valid statistical inference methods for the unknown regression function.
Efficient and implementable algorithms are devised for both batch and online learning settings.
arXiv Detail & Related papers (2023-10-15T01:41:42Z) - Accelerating Generalized Random Forests with Fixed-Point Trees [2.810283834703862]
Estimators are constructed by leveraging random forests as an adaptive kernel weighting algorithm.
We propose a new tree-growing rule for generalized random forests induced from a fixed-point iteration type of approximation.
arXiv Detail & Related papers (2023-06-20T21:45:35Z) - Contextual Decision Trees [62.997667081978825]
We propose a multi-armed contextual bandit recommendation framework for feature-based selection of a single shallow tree of the learned ensemble.
The trained system, which works on top of the Random Forest, dynamically identifies a base predictor that is responsible for providing the final output.
arXiv Detail & Related papers (2022-07-13T17:05:08Z) - Geometry- and Accuracy-Preserving Random Forest Proximities [3.265773263570237]
We introduce a novel definition of random forest proximities called Random Forest-Geometry- and Accuracy-Preserving proximities (RF-GAP)
We prove that RF-GAP exactly match the out-of-bag random forest prediction, thus capturing the data geometry learned by the random forest.
This improved geometric representation outperforms traditional random forest proximities in tasks such as data imputation and provides outlier detection and visualization results consistent with the learned data geometry.
arXiv Detail & Related papers (2022-01-29T23:13:53Z) - Ensembles of Double Random Forest [1.7205106391379026]
We propose two approaches for generating ensembles of double random forest.
In the first approach, we propose a rotation based ensemble of double random forest.
In the second approach, we propose oblique ensembles of double random forest.
arXiv Detail & Related papers (2021-11-03T04:19:41Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - An Efficient Adversarial Attack for Tree Ensembles [91.05779257472675]
adversarial attacks on tree based ensembles such as gradient boosting decision trees (DTs) and random forests (RFs)
We show that our method can be thousands of times faster than the previous mixed-integer linear programming (MILP) based approach.
Our code is available at https://chong-z/tree-ensemble-attack.
arXiv Detail & Related papers (2020-10-22T10:59:49Z) - Convex Polytope Trees [57.56078843831244]
convex polytope trees (CPT) are proposed to expand the family of decision trees by an interpretable generalization of their decision boundary.
We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given.
arXiv Detail & Related papers (2020-10-21T19:38:57Z) - Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm.
We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.