Irredundant k-Fold Cross-Validation
- URL: http://arxiv.org/abs/2507.20048v1
- Date: Sat, 26 Jul 2025 19:59:37 GMT
- Title: Irredundant k-Fold Cross-Validation
- Authors: Jesus S. Aguilar-Ruiz,
- Abstract summary: In traditional k-fold cross-validation, each instance is used ($k!-!1$) times for training and once for testing, leading to redundancy.<n>We introduce Irredundant $k$--fold cross-validation, a novel method that guarantees each instance is used exactly once for training and once for testing.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In traditional k-fold cross-validation, each instance is used ($k\!-\!1$) times for training and once for testing, leading to redundancy that lets many instances disproportionately influence the learning phase. We introduce Irredundant $k$--fold cross-validation, a novel method that guarantees each instance is used exactly once for training and once for testing across the entire validation procedure. This approach ensures a more balanced utilization of the dataset, mitigates overfitting due to instance repetition, and enables sharper distinctions in comparative model analysis. The method preserves stratification and remains model-agnostic, i.e., compatible with any classifier. Experimental results demonstrate that it delivers consistent performance estimates across diverse datasets --comparable to $k$--fold cross-validation-- while providing less optimistic variance estimates because training partitions are non-overlapping, and significantly reducing the overall computational cost.
Related papers
- Mitigating covariate shift in non-colocated data with learned parameter priors [0.0]
We present textitFragmentation-induced co-shift remediation ($FIcsR$), which minimizes an $f$-divergence between a fragment's covariate distribution and that of the standard cross-validation baseline.
We run extensive classification experiments on multiple data classes, over $40$ datasets, and with data batched over multiple sequence lengths.
The results are promising under all these conditions; with improved accuracy against batch and fold state-of-the-art by more than $5%$ and $10%$, respectively.
arXiv Detail & Related papers (2024-11-10T15:48:29Z) - Predictive Performance Test based on the Exhaustive Nested Cross-Validation for High-dimensional data [7.62566998854384]
Cross-validation is used for several tasks such as estimating the prediction error, tuning the regularization parameter, and selecting the most suitable predictive model.
The K-fold cross-validation is a popular CV method but its limitation is that the risk estimates are highly dependent on the partitioning of the data.
This study presents an alternative novel predictive performance test and valid confidence intervals based on exhaustive nested cross-validation.
arXiv Detail & Related papers (2024-08-06T12:28:16Z) - Distributional bias compromises leave-one-out cross-validation [0.6656737591902598]
Cross-validation is a common method for estimating the predictive performance of machine learning models.<n>We show that an approach called "leave-one-out cross-validation" creates a negative correlation between the average label of each training fold and the label of its corresponding test instance.<n>We propose a generalizable rebalanced cross-validation approach that corrects for distributional bias for both classification and regression.
arXiv Detail & Related papers (2024-06-03T15:47:34Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Bootstrapping the Cross-Validation Estimate [3.5159221757909656]
Cross-validation is a widely used technique for evaluating the performance of prediction models.
It is essential to accurately quantify the uncertainty associated with the estimate.
This paper proposes a fast bootstrap method that quickly estimates the standard error of the cross-validation estimate.
arXiv Detail & Related papers (2023-07-01T07:50:54Z) - Intersection of Parallels as an Early Stopping Criterion [64.8387564654474]
We propose a method to spot an early stopping point in the training iterations without the need for a validation set.
For a wide range of learning rates, our method, called Cosine-Distance Criterion (CDC), leads to better generalization on average than all the methods that we compare against.
arXiv Detail & Related papers (2022-08-19T19:42:41Z) - Fast calculation of Gaussian Process multiple-fold cross-validation
residuals and their covariances [0.6091702876917281]
We generalize fast leave-one-out formulae to multiple-fold cross-validation.
We highlight the covariance structure of cross-validation residuals in both Simple and Universal Kriging frameworks.
Our results enable fast multiple-fold cross-validation and have direct consequences in model diagnostics.
arXiv Detail & Related papers (2021-01-08T17:02:37Z) - Cross-validation Confidence Intervals for Test Error [83.67415139421448]
This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm.
Results are the first of their kind for the popular choice of leave-one-out cross-validation.
arXiv Detail & Related papers (2020-07-24T17:40:06Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.