Can we globally optimize cross-validation loss? Quasiconvexity in ridge
regression
- URL: http://arxiv.org/abs/2107.09194v1
- Date: Mon, 19 Jul 2021 23:22:24 GMT
- Title: Can we globally optimize cross-validation loss? Quasiconvexity in ridge
regression
- Authors: William T. Stephenson and Zachary Frangella and Madeleine Udell and
Tamara Broderick
- Abstract summary: We show that in the case of ridge regression, the CV loss may fail to be quasi research and may have multiple local optima.
More generally, we show that quasi-flatity status is independent of many properties of optimum data responses.
- Score: 38.18195443944592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Models like LASSO and ridge regression are extensively used in practice due
to their interpretability, ease of use, and strong theoretical guarantees.
Cross-validation (CV) is widely used for hyperparameter tuning in these models,
but do practical optimization methods minimize the true out-of-sample loss? A
recent line of research promises to show that the optimum of the CV loss
matches the optimum of the out-of-sample loss (possibly after simple
corrections). It remains to show how tractable it is to minimize the CV loss.
In the present paper, we show that, in the case of ridge regression, the CV
loss may fail to be quasiconvex and thus may have multiple local optima. We can
guarantee that the CV loss is quasiconvex in at least one case: when the
spectrum of the covariate matrix is nearly flat and the noise in the observed
responses is not too high. More generally, we show that quasiconvexity status
is independent of many properties of the observed data (response norm,
covariate-matrix right singular vectors and singular-value scaling) and has a
complex dependence on the few that remain. We empirically confirm our theory
using simulated experiments.
Related papers
- Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Oversampling Divide-and-conquer for Response-skewed Kernel Ridge
Regression [20.00435452480056]
We develop a novel response-adaptive partition strategy to overcome the limitation of the divide-and-conquer method.
We show the proposed estimate has a smaller mean squared error (AMSE) than that of the classical dacKRR estimate under mild conditions.
arXiv Detail & Related papers (2021-07-13T04:01:04Z) - Noisy Linear Convergence of Stochastic Gradient Descent for CV@R
Statistical Learning under Polyak-{\L}ojasiewicz Conditions [4.721069729610892]
Conditional Value-at-Risk ($mathrmCV@R$) is one of the most popular measures of risk.
We prove that $mathrmCV@R$ can be used as a performance criterion in supervised statistical learning.
arXiv Detail & Related papers (2020-12-14T18:22:53Z) - When to Impute? Imputation before and during cross-validation [0.0]
Cross-validation (CV) is a technique used to estimate generalization error for prediction models.
It has been recommended the entire sequence of steps be carried out during each replicate of CV to mimic the application of the entire pipeline to an external testing set.
arXiv Detail & Related papers (2020-10-01T23:04:16Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z) - Classification vs regression in overparameterized regimes: Does the loss
function matter? [21.75115239010008]
We show that solutions obtained by least-squares minimum-norm, typically used for regression, are identical to those produced by the hard-margin support vector machine (SVM)
Our results demonstrate the very different roles and properties of loss functions used at the training phase (optimization) and the testing phase (generalization)
arXiv Detail & Related papers (2020-05-16T17:58:25Z) - Imputation for High-Dimensional Linear Regression [8.841513006680886]
We show that LASSO retains the minimax estimation rate in the random setting.
We show that the co-root remains mphmph in this setting.
arXiv Detail & Related papers (2020-01-24T19:54:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.