Approximate Leave-one-out Cross Validation for Regression with $\ell_1$
Regularizers (extended version)
- URL: http://arxiv.org/abs/2310.17629v1
- Date: Thu, 26 Oct 2023 17:48:10 GMT
- Title: Approximate Leave-one-out Cross Validation for Regression with $\ell_1$
Regularizers (extended version)
- Authors: Arnab Auddy, Haolin Zou, Kamiar Rahnama Rad, Arian Maleki
- Abstract summary: We present a novel theory for a wide class of problems in the generalized linear model family with non-differentiable regularizers.
We show that |ALO - LO| goes to zero as p goes to infinity while n/p and SNR are fixed and bounded.
- Score: 12.029919627622954
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The out-of-sample error (OO) is the main quantity of interest in risk
estimation and model selection. Leave-one-out cross validation (LO) offers a
(nearly) distribution-free yet computationally demanding approach to estimate
OO. Recent theoretical work showed that approximate leave-one-out cross
validation (ALO) is a computationally efficient and statistically reliable
estimate of LO (and OO) for generalized linear models with differentiable
regularizers. For problems involving non-differentiable regularizers, despite
significant empirical evidence, the theoretical understanding of ALO's error
remains unknown. In this paper, we present a novel theory for a wide class of
problems in the generalized linear model family with non-differentiable
regularizers. We bound the error |ALO - LO| in terms of intuitive metrics such
as the size of leave-i-out perturbations in active sets, sample size n, number
of features p and regularization parameters. As a consequence, for the
$\ell_1$-regularized problems, we show that |ALO - LO| goes to zero as p goes
to infinity while n/p and SNR are fixed and bounded.
Related papers
- Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems.
We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z) - A Targeted Accuracy Diagnostic for Variational Approximations [8.969208467611896]
Variational Inference (VI) is an attractive alternative to Markov Chain Monte Carlo (MCMC)
Existing methods characterize the quality of the whole variational distribution.
We propose the TArgeted Diagnostic for Distribution Approximation Accuracy (TADDAA)
arXiv Detail & Related papers (2023-02-24T02:50:18Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Generic Error Bounds for the Generalized Lasso with Sub-Exponential Data [4.56877715768796]
This work performs a non-asymptotic analysis of the generalized Lasso under the assumption of sub-exponential data.
We show that the estimation error can be controlled by means of two complexity parameters that arise naturally from a generic-chaining-based proof strategy.
arXiv Detail & Related papers (2020-04-11T10:39:48Z) - Leverage the Average: an Analysis of KL Regularization in RL [44.01222241795292]
We show that Kullback-Leibler (KL) regularization implicitly averages q-values.
We provide a very strong performance bound, the very first to combine two desirable aspects.
Some of our assumptions do not hold with neural networks, so we complement this theoretical analysis with an extensive empirical study.
arXiv Detail & Related papers (2020-03-31T10:55:06Z) - Error bounds in estimating the out-of-sample prediction error using
leave-one-out cross validation in high-dimensions [19.439945058410203]
We study the problem of out-of-sample risk estimation in the high dimensional regime.
Extensive empirical evidence confirms the accuracy of leave-one-out cross validation.
One technical advantage of the theory is that it can be used to clarify and connect some results from the recent literature on scalable approximate LO.
arXiv Detail & Related papers (2020-03-03T20:07:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.