Asymptotically free sketched ridge ensembles: Risks, cross-validation, and tuning
- URL: http://arxiv.org/abs/2310.04357v3
- Date: Tue, 19 Mar 2024 23:23:59 GMT
- Title: Asymptotically free sketched ridge ensembles: Risks, cross-validation, and tuning
- Authors: Pratik Patil, Daniel LeJeune,
- Abstract summary: We employ random matrix theory to establish consistency of generalized cross validation (GCV) for estimating prediction risks of sketched ridge regression ensembles.
For squared prediction risk, we provide a decomposition into an unsketched equivalent implicit ridge bias and a sketching-based variance, and prove that the risk can be globally tuning by only sketch size in infinite ensembles.
We also propose an "ensemble trick" whereby the risk for unsketched ridge regression can be efficiently estimated via GCV using small sketched ridge ensembles.
- Score: 5.293069542318491
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We employ random matrix theory to establish consistency of generalized cross validation (GCV) for estimating prediction risks of sketched ridge regression ensembles, enabling efficient and consistent tuning of regularization and sketching parameters. Our results hold for a broad class of asymptotically free sketches under very mild data assumptions. For squared prediction risk, we provide a decomposition into an unsketched equivalent implicit ridge bias and a sketching-based variance, and prove that the risk can be globally optimized by only tuning sketch size in infinite ensembles. For general subquadratic prediction risk functionals, we extend GCV to construct consistent risk estimators, and thereby obtain distributional convergence of the GCV-corrected predictions in Wasserstein-2 metric. This in particular allows construction of prediction intervals with asymptotically correct coverage conditional on the training data. We also propose an "ensemble trick" whereby the risk for unsketched ridge regression can be efficiently estimated via GCV using small sketched ridge ensembles. We empirically validate our theoretical results using both synthetic and real large-scale datasets with practical sketches including CountSketch and subsampled randomized discrete cosine transforms.
Related papers
- Semiparametric conformal prediction [79.6147286161434]
Risk-sensitive applications require well-calibrated prediction sets over multiple, potentially correlated target variables.
We treat the scores as random vectors and aim to construct the prediction set accounting for their joint correlation structure.
We report desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Distributionally Robust Skeleton Learning of Discrete Bayesian Networks [9.46389554092506]
We consider the problem of learning the exact skeleton of general discrete Bayesian networks from potentially corrupted data.
We propose to optimize the most adverse risk over a family of distributions within bounded Wasserstein distance or KL divergence to the empirical distribution.
We present efficient algorithms and show the proposed methods are closely related to the standard regularized regression approach.
arXiv Detail & Related papers (2023-11-10T15:33:19Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Corrected generalized cross-validation for finite ensembles of penalized estimators [5.165142221427927]
Generalized cross-validation (GCV) is a widely-used method for estimating the squared out-of-sample prediction risk.
We show that GCV is inconsistent for any finite ensemble of size greater than one.
arXiv Detail & Related papers (2023-10-02T17:38:54Z) - Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation [4.87717454493713]
We study subsampling-based ridge ensembles in the proportionals regime.
We prove that the risk of the optimal full ridgeless ensemble (fitted on all possible subsamples) matches that of the optimal ridge predictor.
arXiv Detail & Related papers (2023-04-25T17:43:27Z) - Extrapolated cross-validation for randomized ensembles [2.3609229325947885]
This paper introduces a cross-validation method, ECV, for tuning the ensemble and subsample sizes in randomized ensembles.
We show that ECV yields $delta$-optimal ensembles for squared prediction risk.
In comparison to sample-split cross-validation and $K$-fold cross-validation, ECV achieves higher accuracy avoiding sample splitting.
arXiv Detail & Related papers (2023-02-27T04:19:18Z) - Bagging in overparameterized learning: Risk characterization and risk
monotonization [2.6534407766508177]
We study the prediction risk of variants of bagged predictors under the proportionals regime.
Specifically, we propose a general strategy to analyze the prediction risk under squared error loss of bagged predictors.
arXiv Detail & Related papers (2022-10-20T17:45:58Z) - Mitigating multiple descents: A model-agnostic framework for risk
monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation.
We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z) - Distributed Sketching for Randomized Optimization: Exact
Characterization, Concentration and Lower Bounds [54.51566432934556]
We consider distributed optimization methods for problems where forming the Hessian is computationally challenging.
We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems.
arXiv Detail & Related papers (2022-03-18T05:49:13Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.