Ensemble linear interpolators: The role of ensembling
- URL: http://arxiv.org/abs/2309.03354v1
- Date: Wed, 6 Sep 2023 20:38:04 GMT
- Title: Ensemble linear interpolators: The role of ensembling
- Authors: Mingqi Wu, Qiang Sun
- Abstract summary: Interpolators are unstable, for example, the mininum $ell$ norm least square interpolator exhibits test errors when dealing with noisy data.
We study how ensemble stabilizes and thus improves the unbounded performance, measured by the out-of-sample prediction risk, of an individual interpolator.
- Score: 5.135730286836428
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interpolators are unstable. For example, the mininum $\ell_2$ norm least
square interpolator exhibits unbounded test errors when dealing with noisy
data. In this paper, we study how ensemble stabilizes and thus improves the
generalization performance, measured by the out-of-sample prediction risk, of
an individual interpolator. We focus on bagged linear interpolators, as bagging
is a popular randomization-based ensemble method that can be implemented in
parallel. We introduce the multiplier-bootstrap-based bagged least square
estimator, which can then be formulated as an average of the sketched least
square estimators. The proposed multiplier bootstrap encompasses the classical
bootstrap with replacement as a special case, along with a more intriguing
variant which we call the Bernoulli bootstrap.
Focusing on the proportional regime where the sample size scales
proportionally with the feature dimensionality, we investigate the
out-of-sample prediction risks of the sketched and bagged least square
estimators in both underparametrized and overparameterized regimes. Our results
reveal the statistical roles of sketching and bagging. In particular, sketching
modifies the aspect ratio and shifts the interpolation threshold of the minimum
$\ell_2$ norm estimator. However, the risk of the sketched estimator continues
to be unbounded around the interpolation threshold due to excessive variance.
In stark contrast, bagging effectively mitigates this variance, leading to a
bounded limiting out-of-sample prediction risk. To further understand this
stability improvement property, we establish that bagging acts as a form of
implicit regularization, substantiated by the equivalence of the bagged
estimator with its explicitly regularized counterpart. We also discuss several
extensions.
Related papers
- Semiparametric conformal prediction [79.6147286161434]
Risk-sensitive applications require well-calibrated prediction sets over multiple, potentially correlated target variables.
We treat the scores as random vectors and aim to construct the prediction set accounting for their joint correlation structure.
We report desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.
We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.
We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - Nearest Neighbor Sampling for Covariate Shift Adaptation [7.940293148084844]
We propose a new covariate shift adaptation method which avoids estimating the weights.
The basic idea is to directly work on unlabeled target data, labeled according to the $k$-nearest neighbors in the source dataset.
Our experiments show that it achieves drastic reduction in the running time with remarkable accuracy.
arXiv Detail & Related papers (2023-12-15T17:28:09Z) - Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression [12.443289202402761]
We show the benefits of batch- partitioning through the lens of a minimum-norm overparametrized linear regression model.
We characterize the optimal batch size and show it is inversely proportional to the noise level.
We also show that shrinking the batch minimum-norm estimator by a factor equal to the Weiner coefficient further stabilizes it and results in lower quadratic risk in all settings.
arXiv Detail & Related papers (2023-06-14T11:02:08Z) - Generalized equivalences between subsampling and ridge regularization [3.1346887720803505]
We prove structural and risk equivalences between subsampling and ridge regularization for ensemble ridge estimators.
An indirect implication of our equivalences is that optimally tuned ridge regression exhibits a monotonic prediction risk in the data aspect ratio.
arXiv Detail & Related papers (2023-05-29T14:05:51Z) - Bagging in overparameterized learning: Risk characterization and risk
monotonization [2.6534407766508177]
We study the prediction risk of variants of bagged predictors under the proportionals regime.
Specifically, we propose a general strategy to analyze the prediction risk under squared error loss of bagged predictors.
arXiv Detail & Related papers (2022-10-20T17:45:58Z) - Foolish Crowds Support Benign Overfitting [20.102619493827024]
We prove a lower bound on the excess risk of sparse interpolating procedures for linear regression with Gaussian data.
Our analysis exposes the benefit of the "wisdom of the crowd", except here the harm arising from fitting the noise is ameliorated by spreading it among many directions.
arXiv Detail & Related papers (2021-10-06T16:56:37Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$.
The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z) - Estimating Gradients for Discrete Random Variables by Sampling without
Replacement [93.09326095997336]
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement.
We show that our estimator can be derived as the Rao-Blackwellization of three different estimators.
arXiv Detail & Related papers (2020-02-14T14:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.