Cross-validation Confidence Intervals for Test Error
- URL: http://arxiv.org/abs/2007.12671v2
- Date: Sat, 31 Oct 2020 17:24:26 GMT
- Title: Cross-validation Confidence Intervals for Test Error
- Authors: Pierre Bayle, Alexandre Bayle, Lucas Janson, Lester Mackey
- Abstract summary: This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm.
Results are the first of their kind for the popular choice of leave-one-out cross-validation.
- Score: 83.67415139421448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work develops central limit theorems for cross-validation and consistent
estimators of its asymptotic variance under weak stability conditions on the
learning algorithm. Together, these results provide practical,
asymptotically-exact confidence intervals for $k$-fold test error and valid,
powerful hypothesis tests of whether one learning algorithm has smaller
$k$-fold test error than another. These results are also the first of their
kind for the popular choice of leave-one-out cross-validation. In our real-data
experiments with diverse learning algorithms, the resulting intervals and tests
outperform the most popular alternative methods from the literature.
Related papers
- Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Efficient Discrepancy Testing for Learning with Distribution Shift [17.472049019016524]
We provide the first set of provably efficient algorithms for testing localized discrepancy distance.
Results imply a broad set of new, efficient learning algorithms in the recently introduced model of Testable Learning with Distribution Shift.
arXiv Detail & Related papers (2024-06-13T17:51:10Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Near-Optimal Non-Parametric Sequential Tests and Confidence Sequences
with Possibly Dependent Observations [44.71254888821376]
We provide the first type-I-error and expected-rejection-time guarantees under general non-data generating processes.
We show how to apply our results to inference on parameters defined by estimating equations, such as average treatment effects.
arXiv Detail & Related papers (2022-12-29T18:37:08Z) - MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL)
We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately.
Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z) - Model-agnostic out-of-distribution detection using combined statistical
tests [15.27980070479021]
We present simple methods for out-of-distribution detection using a trained generative model.
We combine a classical parametric test (Rao's score test) with the recently introduced typicality test.
Despite their simplicity and generality, these methods can be competitive with model-specific out-of-distribution detection algorithms.
arXiv Detail & Related papers (2022-03-02T13:32:09Z) - CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning.
We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Probabilistic Diagnostic Tests for Degradation Problems in Supervised
Learning [0.0]
Problems such as class imbalance, overlapping, small-disjuncts, noisy labels, and sparseness limit accuracy in classification algorithms.
Probability diagnostic model based on identifying signs and symptoms of each problem is presented.
Behavior and performance of several supervised algorithms are studied when training sets have such problems.
arXiv Detail & Related papers (2020-04-06T20:32:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.