Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation
- URL: http://arxiv.org/abs/2304.13016v2
- Date: Sun, 16 Jul 2023 09:38:01 GMT
- Title: Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation
- Authors: Jin-Hong Du, Pratik Patil, Arun Kumar Kuchibhotla
- Abstract summary: We study subsampling-based ridge ensembles in the proportionals regime.
We prove that the risk of the optimal full ridgeless ensemble (fitted on all possible subsamples) matches that of the optimal ridge predictor.
- Score: 4.87717454493713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study subsampling-based ridge ensembles in the proportional asymptotics
regime, where the feature size grows proportionally with the sample size such
that their ratio converges to a constant. By analyzing the squared prediction
risk of ridge ensembles as a function of the explicit penalty $\lambda$ and the
limiting subsample aspect ratio $\phi_s$ (the ratio of the feature size to the
subsample size), we characterize contours in the $(\lambda, \phi_s)$-plane at
any achievable risk. As a consequence, we prove that the risk of the optimal
full ridgeless ensemble (fitted on all possible subsamples) matches that of the
optimal ridge predictor. In addition, we prove strong uniform consistency of
generalized cross-validation (GCV) over the subsample sizes for estimating the
prediction risk of ridge ensembles. This allows for GCV-based tuning of full
ridgeless ensembles without sample splitting and yields a predictor whose risk
matches optimal ridge risk.
Related papers
- Precise Asymptotics of Bagging Regularized M-estimators [5.165142221427928]
We characterize the squared prediction risk of ensemble estimators obtained through subagging (subsample bootstrap aggregating) regularized M-estimators.
Key to our analysis is a new result on the joint behavior of correlations between the estimator and residual errors on overlapping subsamples.
Joint optimization of subsample size, ensemble size, and regularization can significantly outperform regularizer optimization alone on the full data.
arXiv Detail & Related papers (2024-09-23T17:48:28Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Asymptotically free sketched ridge ensembles: Risks, cross-validation, and tuning [5.293069542318491]
We employ random matrix theory to establish consistency of generalized cross validation (GCV) for estimating prediction risks of sketched ridge regression ensembles.
For squared prediction risk, we provide a decomposition into an unsketched equivalent implicit ridge bias and a sketching-based variance, and prove that the risk can be globally tuning by only sketch size in infinite ensembles.
We also propose an "ensemble trick" whereby the risk for unsketched ridge regression can be efficiently estimated via GCV using small sketched ridge ensembles.
arXiv Detail & Related papers (2023-10-06T16:27:43Z) - Generalized equivalences between subsampling and ridge regularization [3.1346887720803505]
We prove structural and risk equivalences between subsampling and ridge regularization for ensemble ridge estimators.
An indirect implication of our equivalences is that optimally tuned ridge regression exhibits a monotonic prediction risk in the data aspect ratio.
arXiv Detail & Related papers (2023-05-29T14:05:51Z) - Extrapolated cross-validation for randomized ensembles [2.3609229325947885]
This paper introduces a cross-validation method, ECV, for tuning the ensemble and subsample sizes in randomized ensembles.
We show that ECV yields $delta$-optimal ensembles for squared prediction risk.
In comparison to sample-split cross-validation and $K$-fold cross-validation, ECV achieves higher accuracy avoiding sample splitting.
arXiv Detail & Related papers (2023-02-27T04:19:18Z) - Bagging in overparameterized learning: Risk characterization and risk
monotonization [2.6534407766508177]
We study the prediction risk of variants of bagged predictors under the proportionals regime.
Specifically, we propose a general strategy to analyze the prediction risk under squared error loss of bagged predictors.
arXiv Detail & Related papers (2022-10-20T17:45:58Z) - BRIO: Bringing Order to Abstractive Summarization [107.97378285293507]
We propose a novel training paradigm which assumes a non-deterministic distribution.
Our method achieves a new state-of-the-art result on the CNN/DailyMail (47.78 ROUGE-1) and XSum (49.07 ROUGE-1) datasets.
arXiv Detail & Related papers (2022-03-31T05:19:38Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z) - Distributionally Robust Bayesian Quadrature Optimization [60.383252534861136]
We study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d. samples.
A standard BQO approach maximizes the Monte Carlo estimate of the true expected objective given the fixed sample set.
We propose a novel posterior sampling based algorithm, namely distributionally robust BQO (DRBQO) for this purpose.
arXiv Detail & Related papers (2020-01-19T12:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.