Generalised Boosted Forests
- URL: http://arxiv.org/abs/2102.12561v1
- Date: Wed, 24 Feb 2021 21:17:31 GMT
- Title: Generalised Boosted Forests
- Authors: Indrayudh Ghosal, Giles Hooker
- Abstract summary: We start with an MLE-type estimate in the link space and then define generalised residuals from it.
We use these residuals and some corresponding weights to fit a base random forest and then repeat the same to obtain a boost random forest.
We show with simulated and real data that both the random forest steps reduces test-set log-likelihood, which we treat as our primary metric.
- Score: 0.9899017174990579
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper extends recent work on boosting random forests to model
non-Gaussian responses. Given an exponential family $\mathbb{E}[Y|X] =
g^{-1}(f(X))$ our goal is to obtain an estimate for $f$. We start with an
MLE-type estimate in the link space and then define generalised residuals from
it. We use these residuals and some corresponding weights to fit a base random
forest and then repeat the same to obtain a boost random forest. We call the
sum of these three estimators a \textit{generalised boosted forest}. We show
with simulated and real data that both the random forest steps reduces test-set
log-likelihood, which we treat as our primary metric. We also provide a
variance estimator, which we can obtain with the same computational cost as the
original estimate itself. Empirical experiments on real-world data and
simulations demonstrate that the methods can effectively reduce bias, and that
confidence interval coverage is conservative in the bulk of the covariate
distribution.
Related papers
- Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.
We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.
We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - Inference with Mondrian Random Forests [7.842152902652216]
We give a central limit theorem for the estimates made by a Mondrian random forest in the regression setting.
We also provide a debiasing procedure for Mondrian random forests which allows them to achieve minimax-optimal estimation rates.
arXiv Detail & Related papers (2023-10-15T01:41:42Z) - Accelerating Generalized Random Forests with Fixed-Point Trees [2.810283834703862]
Estimators are constructed by leveraging random forests as an adaptive kernel weighting algorithm.
We propose a new tree-growing rule for generalized random forests induced from a fixed-point iteration type of approximation.
arXiv Detail & Related papers (2023-06-20T21:45:35Z) - Revealing Unobservables by Deep Learning: Generative Element Extraction
Networks (GEEN) [5.3028918247347585]
This paper proposes a novel method for estimating realizations of a latent variable $X*$ in a random sample.
To the best of our knowledge, this paper is the first to provide such identification in observation.
arXiv Detail & Related papers (2022-10-04T01:09:05Z) - Distributional Gradient Boosting Machines [77.34726150561087]
Our framework is based on XGBoost and LightGBM.
We show that our framework achieves state-of-the-art forecast accuracy.
arXiv Detail & Related papers (2022-04-02T06:32:19Z) - On Variance Estimation of Random Forests [0.0]
This paper develops an unbiased variance estimator based on incomplete U-statistics.
We show that our estimators enjoy lower bias and more accurate confidence interval coverage without additional computational costs.
arXiv Detail & Related papers (2022-02-18T03:35:47Z) - Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems.
We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Random Planted Forest: a directly interpretable tree ensemble [0.0]
We introduce a novel interpretable tree based algorithm for prediction in a regression setting.
In a simulation study we find encouraging prediction and visualisation properties of our random planted forest method.
arXiv Detail & Related papers (2020-12-29T01:51:59Z) - Showing Your Work Doesn't Always Work [73.63200097493576]
"Show Your Work: Improved Reporting of Experimental Results" advocates for reporting the expected validation effectiveness of the best-tuned model.
We analytically show that their estimator is biased and uses error-prone assumptions.
We derive an unbiased alternative and bolster our claims with empirical evidence from statistical simulation.
arXiv Detail & Related papers (2020-04-28T17:59:01Z) - Censored Quantile Regression Forest [81.9098291337097]
We develop a new estimating equation that adapts to censoring and leads to quantile score whenever the data do not exhibit censoring.
The proposed procedure named it censored quantile regression forest, allows us to estimate quantiles of time-to-event without any parametric modeling assumption.
arXiv Detail & Related papers (2020-01-08T23:20:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.