Cross-validation: what does it estimate and how well does it do it?
- URL: http://arxiv.org/abs/2104.00673v1
- Date: Thu, 1 Apr 2021 17:58:54 GMT
- Title: Cross-validation: what does it estimate and how well does it do it?
- Authors: Stephen Bates and Trevor Hastie and Robert Tibshirani
- Abstract summary: Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood.
We prove that this is not the case for the linear model fit by ordinary least squares; rather it estimates the average prediction error of models fit on other unseen training sets drawn from the same population.
- Score: 2.049702429898688
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-validation is a widely-used technique to estimate prediction error, but
its behavior is complex and not fully understood. Ideally, one would like to
think that cross-validation estimates the prediction error for the model at
hand, fit to the training data. We prove that this is not the case for the
linear model fit by ordinary least squares; rather it estimates the average
prediction error of models fit on other unseen training sets drawn from the
same population. We further show that this phenomenon occurs for most popular
estimates of prediction error, including data splitting, bootstrapping, and
Mallow's Cp. Next, the standard confidence intervals for prediction error
derived from cross-validation may have coverage far below the desired level.
Because each data point is used for both training and testing, there are
correlations among the measured accuracies for each fold, and so the usual
estimate of variance is too small. We introduce a nested cross-validation
scheme to estimate this variance more accurately, and show empirically that
this modification leads to intervals with approximately correct coverage in
many examples where traditional cross-validation intervals fail. Lastly, our
analysis also shows that when producing confidence intervals for prediction
accuracy with simple data splitting, one should not re-fit the model on the
combined data, since this invalidates the confidence intervals.
Related papers
- Normalizing Flows for Conformal Regression [0.0]
Conformal Prediction (CP) algorithms estimate the uncertainty of a prediction model by calibrating its outputs on labeled data.
We present a general scheme to localize the intervals by training the calibration process.
Unlike the Error Reweighting CP algorithm of Papadopoulos et al. (2008), the framework allows estimating the gap between nominal and empirical conditional validity.
arXiv Detail & Related papers (2024-06-05T15:04:28Z) - Distributional bias compromises leave-one-out cross-validation [0.6656737591902598]
Cross-validation is a common method for estimating the predictive performance of machine learning models.
We show that an approach called "leave-one-out cross-validation" creates a negative correlation between the average label of each training fold and the label of its corresponding test instance.
We propose a generalizable rebalanced cross-validation approach that corrects for distributional bias.
arXiv Detail & Related papers (2024-06-03T15:47:34Z) - Bootstrapping the Cross-Validation Estimate [3.5159221757909656]
Cross-validation is a widely used technique for evaluating the performance of prediction models.
It is essential to accurately quantify the uncertainty associated with the estimate.
This paper proposes a fast bootstrap method that quickly estimates the standard error of the cross-validation estimate.
arXiv Detail & Related papers (2023-07-01T07:50:54Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - A Statistical Model for Predicting Generalization in Few-Shot
Classification [6.158812834002346]
We introduce a Gaussian model of the feature distribution to predict the generalization error.
We show that our approach outperforms alternatives such as the leave-one-out cross-validation strategy.
arXiv Detail & Related papers (2022-12-13T10:21:15Z) - The Implicit Delta Method [61.36121543728134]
In this paper, we propose an alternative, the implicit delta method, which works by infinitesimally regularizing the training loss of uncertainty.
We show that the change in the evaluation due to regularization is consistent for the variance of the evaluation estimator, even when the infinitesimal change is approximated by a finite difference.
arXiv Detail & Related papers (2022-11-11T19:34:17Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Confidence intervals for the Cox model test error from cross-validation [91.3755431537592]
Cross-validation (CV) is one of the most widely used techniques in statistical learning for estimating the test error of a model.
Standard confidence intervals for test error using estimates from CV may have coverage below nominal levels.
One way to this issue is by estimating the mean squared error of the prediction error instead using nested CV.
arXiv Detail & Related papers (2022-01-26T06:40:43Z) - Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators.
They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions.
We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.