Inference with Mondrian Random Forests
- URL: http://arxiv.org/abs/2310.09702v1
- Date: Sun, 15 Oct 2023 01:41:42 GMT
- Title: Inference with Mondrian Random Forests
- Authors: Matias D. Cattaneo, Jason M. Klusowski, William G. Underwood
- Abstract summary: We give a central limit theorem for the estimates made by a Mondrian random forest in the regression setting.
We also provide a debiasing procedure for Mondrian random forests which allows them to achieve minimax-optimal estimation rates.
- Score: 7.842152902652216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random forests are popular methods for classification and regression, and
many different variants have been proposed in recent years. One interesting
example is the Mondrian random forest, in which the underlying trees are
constructed according to a Mondrian process. In this paper we give a central
limit theorem for the estimates made by a Mondrian random forest in the
regression setting. When combined with a bias characterization and a consistent
variance estimator, this allows one to perform asymptotically valid statistical
inference, such as constructing confidence intervals, on the unknown regression
function. We also provide a debiasing procedure for Mondrian random forests
which allows them to achieve minimax-optimal estimation rates with
$\beta$-H\"older regression functions, for all $\beta$ and in arbitrary
dimension, assuming appropriate parameter tuning.
Related papers
- Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.
We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.
We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - Adaptive Split Balancing for Optimal Random Forest [10.021381302215062]
We introduce the adaptive split balancing forest (ASBF), capable of learning tree representations from data.
We also propose a localized version that attains the minimax rate under the H"older class $mathcalHq,beta$ for any $qinmathbbN$ and $betain(0,1]$)
arXiv Detail & Related papers (2024-02-17T09:10:40Z) - Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space.
We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum
Minimization [52.25843977506935]
We propose an adaptive variance method, called AdaSpider, for $L$-smooth, non-reduction functions with a finitesum structure.
In doing so, we are able to compute an $epsilon-stationary point with $tildeOleft + st/epsilon calls.
arXiv Detail & Related papers (2022-11-03T14:41:46Z) - On Variance Estimation of Random Forests [0.0]
This paper develops an unbiased variance estimator based on incomplete U-statistics.
We show that our estimators enjoy lower bias and more accurate confidence interval coverage without additional computational costs.
arXiv Detail & Related papers (2022-02-18T03:35:47Z) - Minimax Rates for High-Dimensional Random Tessellation Forests [0.0]
Mondrian forests is the first class of random forests for which minimax rates were obtained in arbitrary dimension.
We show that a large class of random forests with general split directions also achieve minimax optimal convergence rates in arbitrary dimension.
arXiv Detail & Related papers (2021-09-22T06:47:38Z) - Iterative Feature Matching: Toward Provable Domain Generalization with
Logarithmic Environments [55.24895403089543]
Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments.
We present a new algorithm based on performing iterative feature matching that is guaranteed with high probability to yield a predictor that generalizes after seeing only $O(logd_s)$ environments.
arXiv Detail & Related papers (2021-06-18T04:39:19Z) - RFpredInterval: An R Package for Prediction Intervals with Random
Forests and Boosted Forests [0.0]
We have developed a comprehensive R package, RFpredInterval, that integrates 16 methods to build prediction intervals with random forests and boosted forests.
The methods implemented in the package are a new method to build prediction intervals with boosted forests (PIBF) and 15 different variants to produce prediction intervals with random forests proposed by Roy and Larocque ( 2020)
The results show that the proposed method is very competitive and, globally, it outperforms the competing methods.
arXiv Detail & Related papers (2021-06-15T15:27:50Z) - Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware
Regression [91.3373131262391]
Uncertainty is the only certainty there is.
Traditionally, the direct regression formulation is considered and the uncertainty is modeled by modifying the output space to a certain family of probabilistic distributions.
How to model the uncertainty within the present-day technologies for regression remains an open issue.
arXiv Detail & Related papers (2021-03-25T06:56:09Z) - Generalised Boosted Forests [0.9899017174990579]
We start with an MLE-type estimate in the link space and then define generalised residuals from it.
We use these residuals and some corresponding weights to fit a base random forest and then repeat the same to obtain a boost random forest.
We show with simulated and real data that both the random forest steps reduces test-set log-likelihood, which we treat as our primary metric.
arXiv Detail & Related papers (2021-02-24T21:17:31Z) - Censored Quantile Regression Forest [81.9098291337097]
We develop a new estimating equation that adapts to censoring and leads to quantile score whenever the data do not exhibit censoring.
The proposed procedure named it censored quantile regression forest, allows us to estimate quantiles of time-to-event without any parametric modeling assumption.
arXiv Detail & Related papers (2020-01-08T23:20:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.