Cross-validation Confidence Intervals for Test Error
- URL: http://arxiv.org/abs/2007.12671v2
- Date: Sat, 31 Oct 2020 17:24:26 GMT
- Title: Cross-validation Confidence Intervals for Test Error
- Authors: Pierre Bayle, Alexandre Bayle, Lucas Janson, Lester Mackey
- Abstract summary: This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm.
Results are the first of their kind for the popular choice of leave-one-out cross-validation.
- Score: 83.67415139421448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work develops central limit theorems for cross-validation and consistent
estimators of its asymptotic variance under weak stability conditions on the
learning algorithm. Together, these results provide practical,
asymptotically-exact confidence intervals for $k$-fold test error and valid,
powerful hypothesis tests of whether one learning algorithm has smaller
$k$-fold test error than another. These results are also the first of their
kind for the popular choice of leave-one-out cross-validation. In our real-data
experiments with diverse learning algorithms, the resulting intervals and tests
outperform the most popular alternative methods from the literature.
Related papers
- Internal Incoherency Scores for Constraint-based Causal Discovery Algorithms [12.524536193679124]
We propose internal coherency scores that allow testing for assumption violations and finite sample errors.
We illustrate our coherency scores on the PC algorithm with simulated and real-world datasets.
arXiv Detail & Related papers (2025-02-20T16:44:54Z) - $t$-Testing the Waters: Empirically Validating Assumptions for Reliable A/B-Testing [3.988614978933934]
A/B-tests are a cornerstone of experimental design on the web, with wide-ranging applications and use-cases.
We propose a practical method to test whether the $t$-test's assumptions are met, and the A/B-test is valid.
This provides an efficient and effective way to empirically assess whether the $t$-test's assumptions are met, and the A/B-test is valid.
arXiv Detail & Related papers (2025-02-07T09:55:24Z) - Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
We study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation.
First, we derive a novel high-dimensional probability convergence guarantee that depends explicitly on the variance and holds under weak conditions.
We further establish refined high-dimensional Berry-Esseen bounds over the class of convex sets that guarantee faster rates than those in the literature.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Efficient Discrepancy Testing for Learning with Distribution Shift [17.472049019016524]
We provide the first set of provably efficient algorithms for testing localized discrepancy distance.
Results imply a broad set of new, efficient learning algorithms in the recently introduced model of Testable Learning with Distribution Shift.
arXiv Detail & Related papers (2024-06-13T17:51:10Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL)
We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately.
Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z) - CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning.
We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Probabilistic Diagnostic Tests for Degradation Problems in Supervised
Learning [0.0]
Problems such as class imbalance, overlapping, small-disjuncts, noisy labels, and sparseness limit accuracy in classification algorithms.
Probability diagnostic model based on identifying signs and symptoms of each problem is presented.
Behavior and performance of several supervised algorithms are studied when training sets have such problems.
arXiv Detail & Related papers (2020-04-06T20:32:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.