Related papers: The Structure of Cross-Validation Error: Stability, Covariance, and Minimax Limits

The Structure of Cross-Validation Error: Stability, Covariance, and Minimax Limits

URL: http://arxiv.org/abs/2511.03554v1
Date: Wed, 05 Nov 2025 15:35:46 GMT
Title: The Structure of Cross-Validation Error: Stability, Covariance, and Minimax Limits
Authors: Ido Nachum, Rüdiger Urbanke, Thomas Weinberger,
Abstract summary: We show how properties of algorithm-distribution pairs can affect the choice for the number of folds in $k$-fold cross-validation.<n>We also prove that CV cannot attain the optimum of order $1/n$ achievable by a validation set of size $n$.
Score: 3.3008315224941978
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite ongoing theoretical research on cross-validation (CV), many theoretical questions about CV remain widely open. This motivates our investigation into how properties of algorithm-distribution pairs can affect the choice for the number of folds in $k$-fold cross-validation. Our results consist of a novel decomposition of the mean-squared error of cross-validation for risk estimation, which explicitly captures the correlations of error estimates across overlapping folds and includes a novel algorithmic stability notion, squared loss stability, that is considerably weaker than the typically required hypothesis stability in other comparable works. Furthermore, we prove: 1. For every learning algorithm that minimizes empirical error, a minimax lower bound on the mean-squared error of $k$-fold CV estimating the population risk $L_\mathcal{D}$: \[ \min_{k \mid n}\; \max_{\mathcal{D}}\; \mathbb{E}\!\left[\big(\widehat{L}_{\mathrm{CV}}^{(k)} - L_{\mathcal{D}}\big)^{2}\right] \;=\; \Omega\!\big(\sqrt{k}/n\big), \] where $n$ is the sample size and $k$ the number of folds. This shows that even under idealized conditions, for large values of $k$, CV cannot attain the optimum of order $1/n$ achievable by a validation set of size $n$, reflecting an inherent penalty caused by dependence between folds. 2. Complementing this, we exhibit learning rules for which \[ \max_{\mathcal{D}}\; \mathbb{E}\!\left[\big(\widehat{L}_{\mathrm{CV}}^{(k)} - L_{\mathcal{D}}\big)^{2}\right] \;=\; \Omega(k/n), \] matching (up to constants) the accuracy of a hold-out estimator of a single fold of size $n/k$. Together these results delineate the fundamental trade-off in resampling-based risk estimation: CV cannot fully exploit all $n$ samples for unbiased risk evaluation, and its minimax performance is pinned between the $k/n$ and $\sqrt{k}/n$ regimes.

Related papers

Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination [65.37519531362157]
We show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least $tildeOmega(d1/2/alpha2)$.
arXiv Detail & Related papers (2025-10-12T15:42:44Z)
Near-Optimal Sample Complexity for Iterated CVaR Reinforcement Learning with a Generative Model [13.582475656749775]
We study the sample complexity problem of risk-sensitive Reinforcement Learning (RL) with a generative model.<n>We establish nearly matching upper and lower bounds on the sample complexity of this problem.
arXiv Detail & Related papers (2025-03-11T22:31:03Z)
Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination [35.67742880001828]
We give the first computationally efficient algorithm for high-dimensional robust mean estimation with mean-shift contamination.<n>Our algorithm has near-optimal sample complexity, runs in sample-polynomial time, and approximates the target mean to any desired accuracy.
arXiv Detail & Related papers (2025-02-20T17:53:13Z)
Nearly Optimal Robust Covariance and Scatter Matrix Estimation Beyond Gaussians [2.311583680973075]
We study the problem of computationally efficient robust estimation of the covariance/scatter matrix of elliptical distributions.<n>We obtain the first efficiently computable, nearly optimal robust covariance estimators that extend beyond the Gaussian case.
arXiv Detail & Related papers (2025-02-10T15:31:57Z)
Semidefinite programming relaxations and debiasing for MAXCUT-based clustering [1.9761774213809036]
We consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of 2 sub-gaussian distributions in $mathbbRp$.<n>We use semidefinite programming relaxations of an integer quadratic program that is formulated as finding the maximum cut on a graph.
arXiv Detail & Related papers (2024-01-16T03:14:24Z)
Corruption-Robust Offline Reinforcement Learning with General Function Approximation [60.91257031278004]
We investigate the problem of corruption in offline reinforcement learning (RL) with general function approximation. Our goal is to find a policy that is robust to such corruption and minimizes the suboptimality gap with respect to the optimal policy for the uncorrupted Markov decision processes (MDPs)
arXiv Detail & Related papers (2023-10-23T04:07:26Z)
Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP) We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z)
A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions. We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$. We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z)
The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$. In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples. We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z)
Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist. We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z)
Multivariate mean estimation with direction-dependent accuracy [8.147652597876862]
We consider the problem of estimating the mean of a random vector based on $N$ independent, identically distributed observations. We prove an estimator that has a near-optimal error in all directions in which the variance of the one dimensional marginal of the random vector is not too small.
arXiv Detail & Related papers (2020-10-22T17:52:45Z)
Rademacher upper bounds for cross-validation errors with an application to the lasso [6.837167110907022]
We establish a general upper bound for $K$-fold cross-validation ($K$-CV) errors. The CV error upper bound applies to both light-tail and heavy-tail error distributions. We provide a Python package for computing the CV error upper bound in $K$-CV-based algorithms.
arXiv Detail & Related papers (2020-07-30T17:13:03Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.