Related papers: Rademacher upper bounds for cross-validation errors with an application to the lasso

Rademacher upper bounds for cross-validation errors with an application to the lasso

URL: http://arxiv.org/abs/2007.15598v1
Date: Thu, 30 Jul 2020 17:13:03 GMT
Title: Rademacher upper bounds for cross-validation errors with an application to the lasso
Authors: Ning Xu, Timothy C.G. Fisher, Jian Hong
Abstract summary: We establish a general upper bound for $K$-fold cross-validation ($K$-CV) errors. The CV error upper bound applies to both light-tail and heavy-tail error distributions. We provide a Python package for computing the CV error upper bound in $K$-CV-based algorithms.
Score: 6.837167110907022
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We establish a general upper bound for $K$-fold cross-validation ($K$-CV) errors that can be adapted to many $K$-CV-based estimators and learning algorithms. Based on Rademacher complexity of the model and the Orlicz-$\Psi_{\nu}$ norm of the error process, the CV error upper bound applies to both light-tail and heavy-tail error distributions. We also extend the CV error upper bound to $\beta$-mixing data using the technique of independent blocking. We provide a Python package (\texttt{CVbound}, \url{https://github.com/isaac2math}) for computing the CV error upper bound in $K$-CV-based algorithms. Using the lasso as an example, we demonstrate in simulations that the upper bounds are tight and stable across different parameter settings and random seeds. As well as accurately bounding the CV errors for the lasso, the minimizer of the new upper bounds can be used as a criterion for variable selection. Compared with the CV-error minimizer, simulations show that tuning the lasso penalty parameter according to the minimizer of the upper bound yields a more sparse and more stable model that retains all of the relevant variables.

Related papers

RandALO: Out-of-sample risk estimation in no time flat [5.231056284485742]
Cross-validation (CV) serves as the de facto standard for risk estimation but poorly trades off high bias ($K$-fold CV) for computational cost (leave-one-out CV) We propose a randomized approximate leave-one-out (RandALO) risk estimator that is not only a consistent estimator of risk in high dimensions but also less computationally expensive than $K$-fold CV.
arXiv Detail & Related papers (2024-09-15T16:10:03Z)
On the Performance of Empirical Risk Minimization with Smoothed Data [59.3428024282545]
Empirical Risk Minimization (ERM) is able to achieve sublinear error whenever a class is learnable with iid data. We show that ERM is able to achieve sublinear error whenever a class is learnable with iid data.
arXiv Detail & Related papers (2024-02-22T21:55:41Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Bayes beats Cross Validation: Efficient and Accurate Ridge Regression via Expectation Maximization [3.061662434597098]
We present a method for the regularization hyper- parameter, $lambda$, that is faster to compute than leave-one-out cross-validation (LOOCV) We show that the proposed method is guaranteed to find a unique optimal solution for large enough $n$, under relatively mild conditions.
arXiv Detail & Related papers (2023-10-29T01:13:55Z)
REAL: A Representative Error-Driven Approach for Active Learning [15.477921200056887]
$REAL$ is a novel approach to select data instances with $underlineR$epresentative $underlineE$rrors for $underlineA$ctive $underlineL$. It identifies minority predictions as emphpseudo errors within a cluster and allocates an adaptive sampling budget for the cluster based on estimated error density.
arXiv Detail & Related papers (2023-07-03T12:39:26Z)
Contextual Combinatorial Bandits with Probabilistically Triggered Arms [55.9237004478033]
We study contextual bandits with probabilistically triggered arms (C$2$MAB-T) under a variety of smoothness conditions. Under the triggering modulated (TPM) condition, we devise the C$2$-UC-T algorithm and derive a regret bound $tildeO(dsqrtT)$.
arXiv Detail & Related papers (2023-03-30T02:51:00Z)
Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing [30.508036898655114]
Pruning schemes have been widely used in practice to reduce the complexity of trained models with a massive number of parameters. Running gradient descent in the absence of regularization results in models which are not suitable for greedy pruning, i.e., many columns could have their $ell$ norm comparable to that of the maximum. Our results provide the first rigorous insights on why greedy pruning + fine-tuning leads to smaller models which also generalize well.
arXiv Detail & Related papers (2023-03-20T21:05:44Z)
Optimal Query Complexities for Dynamic Trace Estimation [59.032228008383484]
We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly. We provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $epsilon$ error with $delta$ failure probability. Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation.
arXiv Detail & Related papers (2022-09-30T04:15:44Z)
Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems. Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z)
Confidence intervals for the Cox model test error from cross-validation [91.3755431537592]
Cross-validation (CV) is one of the most widely used techniques in statistical learning for estimating the test error of a model. Standard confidence intervals for test error using estimates from CV may have coverage below nominal levels. One way to this issue is by estimating the mean squared error of the prediction error instead using nested CV.
arXiv Detail & Related papers (2022-01-26T06:40:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.