Related papers: Iterative Approximate Cross-Validation

Iterative Approximate Cross-Validation

URL: http://arxiv.org/abs/2303.02732v2
Date: Sat, 27 May 2023 18:57:18 GMT
Title: Iterative Approximate Cross-Validation
Authors: Yuetian Luo and Zhimei Ren and Rina Foygel Barber
Abstract summary: Cross-validation (CV) is one of the most popular tools for assessing and selecting predictive models. In this paper, we propose a new paradigm to efficiently approximate CV when the empirical risk minimization (ERM) problem is solved via an iterative first-order algorithm. Our new method extends existing guarantees for CV approximation to hold along the whole trajectory of the algorithm, including at convergence.
Score: 13.084578404699174
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Cross-validation (CV) is one of the most popular tools for assessing and selecting predictive models. However, standard CV suffers from high computational cost when the number of folds is large. Recently, under the empirical risk minimization (ERM) framework, a line of works proposed efficient methods to approximate CV based on the solution of the ERM problem trained on the full dataset. However, in large-scale problems, it can be hard to obtain the exact solution of the ERM problem, either due to limited computational resources or due to early stopping as a way of preventing overfitting. In this paper, we propose a new paradigm to efficiently approximate CV when the ERM problem is solved via an iterative first-order algorithm, without running until convergence. Our new method extends existing guarantees for CV approximation to hold along the whole trajectory of the algorithm, including at convergence, thus generalizing existing CV approximation methods. Finally, we illustrate the accuracy and computational efficiency of our method through a range of empirical studies.

Related papers

Large-scale Optimization of Partial AUC in a Range of False Positive Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning. We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique. Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z)
Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints. We develop novel and simple optimization formulations for these problems. As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z)
Leave Zero Out: Towards a No-Cross-Validation Approach for Model Selection [21.06860861548758]
Cross Validation (CV) is the main workhorse for model selection. CV suffers a conservatively biased estimation, since some part of the limited data has to hold out for validation. CV tends to be extremely cumbersome, e.g., intolerant time-consuming, due to the repeated training procedures.
arXiv Detail & Related papers (2020-12-24T16:11:53Z)
Approximate Cross-validated Mean Estimates for Bayesian Hierarchical Regression Models [6.824747267214373]
We introduce a novel procedure for obtaining cross-validated predictive estimates for Bayesian hierarchical regression models. We provide theoretical results and demonstrate its efficacy on publicly available data and in simulations.
arXiv Detail & Related papers (2020-11-29T00:00:20Z)
Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z)
When to Impute? Imputation before and during cross-validation [0.0]
Cross-validation (CV) is a technique used to estimate generalization error for prediction models. It has been recommended the entire sequence of steps be carried out during each replicate of CV to mimic the application of the entire pipeline to an external testing set.
arXiv Detail & Related papers (2020-10-01T23:04:16Z)
Approximate Cross-Validation with Low-Rank Data in High Dimensions [35.74302895575951]
Cross-validation is an important tool for model assessment. ACV methods can lose both speed and accuracy in high dimensions unless sparsity structure is present in the data. We develop a new algorithm for ACV that is fast and accurate in the presence of ALR data.
arXiv Detail & Related papers (2020-08-24T16:34:05Z)
Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning. We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z)
Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$. We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
High-Dimensional Robust Mean Estimation via Gradient Descent [73.61354272612752]
We show that the problem of robust mean estimation in the presence of a constant adversarial fraction can be solved by gradient descent. Our work establishes an intriguing connection between the near non-lemma estimation and robust statistics.
arXiv Detail & Related papers (2020-05-04T10:48:04Z)
Approximate Cross-validation: Guarantees for Model Assessment and Selection [18.77512692975483]
Cross-validation (CV) is a popular approach for assessing and selecting predictive models. Recent work in empirical risk minimization approximates the expensive refitting with a single Newton warm-started from the full training set.
arXiv Detail & Related papers (2020-03-02T00:30:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.