Error Reduction from Stacked Regressions
- URL: http://arxiv.org/abs/2309.09880v2
- Date: Wed, 27 Sep 2023 02:25:53 GMT
- Title: Error Reduction from Stacked Regressions
- Authors: Xin Chen and Jason M. Klusowski and Yan Shuo Tan
- Abstract summary: Stacking regressions is an ensemble technique that forms linear combinations of different regression estimators to enhance predictive accuracy.
In this paper, we learn these weights analogously by minimizing an estimate of the population risk subject to a nonnegativity constraint.
Thanks to a shrinkage effect, the resulting stacked estimator has strictly smaller population risk than best single estimator among them.
- Score: 14.226205980875262
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stacking regressions is an ensemble technique that forms linear combinations
of different regression estimators to enhance predictive accuracy. The
conventional approach uses cross-validation data to generate predictions from
the constituent estimators, and least-squares with nonnegativity constraints to
learn the combination weights. In this paper, we learn these weights
analogously by minimizing an estimate of the population risk subject to a
nonnegativity constraint. When the constituent estimators are linear
least-squares projections onto nested subspaces separated by at least three
dimensions, we show that thanks to a shrinkage effect, the resulting stacked
estimator has strictly smaller population risk than best single estimator among
them. Here "best" refers to an estimator that minimizes a model selection
criterion such as AIC or BIC. In other words, in this setting, the best single
estimator is inadmissible. Because the optimization problem can be reformulated
as isotonic regression, the stacked estimator requires the same order of
computation as the best single estimator, making it an attractive alternative
in terms of both performance and implementation.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - On the design-dependent suboptimality of the Lasso [27.970033039287884]
We show that the Lasso estimator is provably minimax rate-suboptimal when the minimum singular value is small.
Our lower bound is strong enough to preclude the sparse statistical optimality of all forms of the Lasso.
arXiv Detail & Related papers (2024-02-01T07:01:54Z) - Tuned Regularized Estimators for Linear Regression via Covariance
Fitting [17.46329281993348]
We consider the problem of finding tuned regularized parameter estimators for linear models.
We show that three known optimal linear estimators belong to a wider class of estimators.
We show that the resulting class of estimators yields tuned versions of known regularized estimators.
arXiv Detail & Related papers (2022-01-21T16:08:08Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Minimax Off-Policy Evaluation for Multi-Armed Bandits [58.7013651350436]
We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards.
We develop minimax rate-optimal procedures under three settings.
arXiv Detail & Related papers (2021-01-19T18:55:29Z) - Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z) - Robust regression with covariate filtering: Heavy tails and adversarial
contamination [6.939768185086755]
We show how to modify the Huber regression, least trimmed squares, and least absolute deviation estimators to obtain estimators simultaneously computationally and statistically efficient in the stronger contamination model.
We show that the Huber regression estimator achieves near-optimal error rates in this setting, whereas the least trimmed squares and least absolute deviation estimators can be made to achieve near-optimal error after applying a postprocessing step.
arXiv Detail & Related papers (2020-09-27T22:48:48Z) - Learning Minimax Estimators via Online Learning [55.92459567732491]
We consider the problem of designing minimax estimators for estimating parameters of a probability distribution.
We construct an algorithm for finding a mixed-case Nash equilibrium.
arXiv Detail & Related papers (2020-06-19T22:49:42Z) - Distributional robustness of K-class estimators and the PULSE [4.56877715768796]
We prove that the classical K-class estimator satisfies such optimality by establishing a connection between K-class estimators and anchor regression.
We show that it can be computed efficiently as a data-driven simulation K-class estimator.
There are several settings including weak instrument settings, where it outperforms other estimators.
arXiv Detail & Related papers (2020-05-07T09:39:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.