Related papers: Lassoed Forests: Random Forests with Adaptive Lasso Post-selection

Lassoed Forests: Random Forests with Adaptive Lasso Post-selection

URL: http://arxiv.org/abs/2511.06698v2
Date: Thu, 13 Nov 2025 01:07:22 GMT
Title: Lassoed Forests: Random Forests with Adaptive Lasso Post-selection
Authors: Jing Shang, James Bannon, Benjamin Haibe-Kains, Robert Tibshirani,
Abstract summary: We show in theory that the relative performance of two methods, standard and Lasso-weighted random forests, depends on the signal-to-noise ratio.<n>We propose a unified framework to combine random forests and Lasso selection by applying adaptive weighting.
Score: 36.24615773895282
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Random forests are a statistical learning technique that use bootstrap aggregation to average high-variance and low-bias trees. Improvements to random forests, such as applying Lasso regression to the tree predictions, have been proposed in order to reduce model bias. However, these changes can sometimes degrade performance (e.g., an increase in mean squared error). In this paper, we show in theory that the relative performance of these two methods, standard and Lasso-weighted random forests, depends on the signal-to-noise ratio. We further propose a unified framework to combine random forests and Lasso selection by applying adaptive weighting and show mathematically that it can strictly outperform the other two methods. We compare the three methods through simulation, including bias-variance decomposition, error estimates evaluation, and variable importance analysis. We also show the versatility of our method by applications to a variety of real-world datasets.

Related papers

Clustered random forests with correlated data for optimal estimation and inference under potential covariate shift [4.13592995550836]
We develop Clustered Random Forests, a random forests algorithm for clustered data, arising from independent groups that exhibit within-cluster dependence.<n>The leaf-wise predictions for each decision tree making up clustered random forests takes the form of a weighted least squares estimator.<n>Clustered random forests are shown for certain tree splitting criteria to be minimax rate optimal for pointwise conditional mean estimation.
arXiv Detail & Related papers (2025-03-16T20:07:23Z)
Adaptive Split Balancing for Optimal Random Forest [8.916614661563893]
We propose a new random forest algorithm that constructs the trees using a novel adaptive split-balancing method. Our method achieves optimality in simple, smooth scenarios while adaptively learning the tree structure from the data.
arXiv Detail & Related papers (2024-02-17T09:10:40Z)
Theoretical and Empirical Advances in Forest Pruning [0.0]
We revisit forest pruning, an approach that aims to have the best of both worlds: the accuracy of regression forests and the interpretability of regression trees.<n>We prove the advantage of a Lasso-pruned forest over its unpruned counterpart under weak assumptions.<n>We test the accuracy of pruned regression forests against their unpruned counterparts on 19 different datasets.
arXiv Detail & Related papers (2024-01-10T20:02:47Z)
Inference with Mondrian Random Forests [7.404568009919416]
We give precise bias and variance characterizations, along with a Berry-Esseen-type central limit theorem, for the Mondrian random forest regression estimator.<n>We present valid statistical inference methods for the unknown regression function.<n> Efficient and implementable algorithms are devised for both batch and online learning settings.
arXiv Detail & Related papers (2023-10-15T01:41:42Z)
Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders. Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency. We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z)
What Makes Forest-Based Heterogeneous Treatment Effect Estimators Work? [1.1050303097572156]
We show that both methods can be understood in terms of the same parameters and confounding assumptions under L2 loss. In the randomized setting, both approaches performed akin to the new blended versions in a benchmark study.
arXiv Detail & Related papers (2022-06-21T12:45:07Z)
Random Forest Weighted Local Fréchet Regression with Random Objects [18.128663071848923]
We propose a novel random forest weighted local Fr'echet regression paradigm.<n>Our first method uses these weights as the local average to solve the conditional Fr'echet mean.<n>Second method performs local linear Fr'echet regression, both significantly improving existing Fr'echet regression methods.
arXiv Detail & Related papers (2022-02-10T09:10:59Z)
Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm. We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z)
A One-step Approach to Covariate Shift Adaptation [82.01909503235385]
A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution. We propose a novel one-step approach that jointly learns the predictive model and the associated weights in one optimization.
arXiv Detail & Related papers (2020-07-08T11:35:47Z)
A Numerical Transform of Random Forest Regressors corrects Systematically-Biased Predictions [0.0]
We find a systematic bias in predictions from random forest models. This bias is recapitulated in simple synthetic datasets. We use the training data to define a numerical transformation that fully corrects it.
arXiv Detail & Related papers (2020-03-16T21:18:06Z)
Censored Quantile Regression Forest [81.9098291337097]
We develop a new estimating equation that adapts to censoring and leads to quantile score whenever the data do not exhibit censoring. The proposed procedure named it censored quantile regression forest, allows us to estimate quantiles of time-to-event without any parametric modeling assumption.
arXiv Detail & Related papers (2020-01-08T23:20:23Z)
Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation [77.7420231319632]
We adapt contextual generation of categorical sequences to a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control. We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios.
arXiv Detail & Related papers (2019-12-31T03:01:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.