Bagging in overparameterized learning: Risk characterization and risk
monotonization
- URL: http://arxiv.org/abs/2210.11445v3
- Date: Tue, 24 Oct 2023 19:17:51 GMT
- Title: Bagging in overparameterized learning: Risk characterization and risk
monotonization
- Authors: Pratik Patil, Jin-Hong Du, Arun Kumar Kuchibhotla
- Abstract summary: We study the prediction risk of variants of bagged predictors under the proportionals regime.
Specifically, we propose a general strategy to analyze the prediction risk under squared error loss of bagged predictors.
- Score: 2.6534407766508177
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bagging is a commonly used ensemble technique in statistics and machine
learning to improve the performance of prediction procedures. In this paper, we
study the prediction risk of variants of bagged predictors under the
proportional asymptotics regime, in which the ratio of the number of features
to the number of observations converges to a constant. Specifically, we propose
a general strategy to analyze the prediction risk under squared error loss of
bagged predictors using classical results on simple random sampling.
Specializing the strategy, we derive the exact asymptotic risk of the bagged
ridge and ridgeless predictors with an arbitrary number of bags under a
well-specified linear model with arbitrary feature covariance matrices and
signal vectors. Furthermore, we prescribe a generic cross-validation procedure
to select the optimal subsample size for bagging and discuss its utility to
eliminate the non-monotonic behavior of the limiting risk in the sample size
(i.e., double or multiple descents). In demonstrating the proposed procedure
for bagged ridge and ridgeless predictors, we thoroughly investigate the oracle
properties of the optimal subsample size and provide an in-depth comparison
between different bagging variants.
Related papers
- High-dimensional prediction for count response via sparse exponential weights [0.0]
This paper introduces a novel probabilistic machine learning framework for high-dimensional count data prediction.
A key contribution is a novel risk measure tailored to count data prediction, with theoretical guarantees for prediction risk using PAC-Bayesian bounds.
Our results include non-asymptotic oracle inequalities, demonstrating rate-optimal prediction error without prior knowledge of sparsity.
arXiv Detail & Related papers (2024-10-20T12:45:42Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Asymptotically free sketched ridge ensembles: Risks, cross-validation, and tuning [5.293069542318491]
We employ random matrix theory to establish consistency of generalized cross validation (GCV) for estimating prediction risks of sketched ridge regression ensembles.
For squared prediction risk, we provide a decomposition into an unsketched equivalent implicit ridge bias and a sketching-based variance, and prove that the risk can be globally tuning by only sketch size in infinite ensembles.
We also propose an "ensemble trick" whereby the risk for unsketched ridge regression can be efficiently estimated via GCV using small sketched ridge ensembles.
arXiv Detail & Related papers (2023-10-06T16:27:43Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation [4.87717454493713]
We study subsampling-based ridge ensembles in the proportionals regime.
We prove that the risk of the optimal full ridgeless ensemble (fitted on all possible subsamples) matches that of the optimal ridge predictor.
arXiv Detail & Related papers (2023-04-25T17:43:27Z) - Extrapolated cross-validation for randomized ensembles [2.3609229325947885]
This paper introduces a cross-validation method, ECV, for tuning the ensemble and subsample sizes in randomized ensembles.
We show that ECV yields $delta$-optimal ensembles for squared prediction risk.
In comparison to sample-split cross-validation and $K$-fold cross-validation, ECV achieves higher accuracy avoiding sample splitting.
arXiv Detail & Related papers (2023-02-27T04:19:18Z) - Mitigating multiple descents: A model-agnostic framework for risk
monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation.
We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z) - Estimating Gradients for Discrete Random Variables by Sampling without
Replacement [93.09326095997336]
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement.
We show that our estimator can be derived as the Rao-Blackwellization of three different estimators.
arXiv Detail & Related papers (2020-02-14T14:15:18Z) - Orthogonal Statistical Learning [49.55515683387805]
We provide non-asymptotic excess risk guarantees for statistical learning in a setting where the population risk depends on an unknown nuisance parameter.
We show that if the population risk satisfies a condition called Neymanity, the impact of the nuisance estimation error on the excess risk bound achieved by the meta-algorithm is of second order.
arXiv Detail & Related papers (2019-01-25T02:21:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.