Is K-fold cross validation the best model selection method for Machine
Learning?
- URL: http://arxiv.org/abs/2401.16407v1
- Date: Mon, 29 Jan 2024 18:46:53 GMT
- Title: Is K-fold cross validation the best model selection method for Machine
Learning?
- Authors: Juan M Gorriz, F Segovia, J Ramirez, A Ortiz and J. Suckling
- Abstract summary: K-fold cross-validation is the most common approach to ascertaining the likelihood that a machine learning outcome is generated by chance.
A novel test based on K-fold CV and the Upper Bound of the actual error (K-fold CUBV) is composed.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As a technique that can compactly represent complex patterns, machine
learning has significant potential for predictive inference. K-fold
cross-validation (CV) is the most common approach to ascertaining the
likelihood that a machine learning outcome is generated by chance and
frequently outperforms conventional hypothesis testing. This improvement uses
measures directly obtained from machine learning classifications, such as
accuracy, that do not have a parametric description. To approach a frequentist
analysis within machine learning pipelines, a permutation test or simple
statistics from data partitions (i.e. folds) can be added to estimate
confidence intervals. Unfortunately, neither parametric nor non-parametric
tests solve the inherent problems around partitioning small sample-size
datasets and learning from heterogeneous data sources. The fact that machine
learning strongly depends on the learning parameters and the distribution of
data across folds recapitulates familiar difficulties around excess false
positives and replication. The origins of this problem are demonstrated by
simulating common experimental circumstances, including small sample sizes, low
numbers of predictors, and heterogeneous data sources. A novel statistical test
based on K-fold CV and the Upper Bound of the actual error (K-fold CUBV) is
composed, where uncertain predictions of machine learning with CV are bounded
by the \emph{worst case} through the evaluation of concentration inequalities.
Probably Approximately Correct-Bayesian upper bounds for linear classifiers in
combination with K-fold CV is used to estimate the empirical error. The
performance with neuroimaging datasets suggests this is a robust criterion for
detecting effects, validating accuracy values obtained from machine learning
whilst avoiding excess false positives.
Related papers
- Predictive Performance Test based on the Exhaustive Nested Cross-Validation for High-dimensional data [7.62566998854384]
Cross-validation is used for several tasks such as estimating the prediction error, tuning the regularization parameter, and selecting the most suitable predictive model.
The K-fold cross-validation is a popular CV method but its limitation is that the risk estimates are highly dependent on the partitioning of the data.
This study presents an alternative novel predictive performance test and valid confidence intervals based on exhaustive nested cross-validation.
arXiv Detail & Related papers (2024-08-06T12:28:16Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Theoretical characterization of uncertainty in high-dimensional linear
classification [24.073221004661427]
We show that uncertainty for learning from limited number of samples of high-dimensional input data and labels can be obtained by the approximate message passing algorithm.
We discuss how over-confidence can be mitigated by appropriately regularising, and show that cross-validating with respect to the loss leads to better calibration than with the 0/1 error.
arXiv Detail & Related papers (2022-02-07T15:32:07Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Approximate Bayesian Computation via Classification [0.966840768820136]
Approximate Computation (ABC) enables statistical inference in complex models whose likelihoods are difficult to calculate but easy to simulate from.
ABC constructs a kernel-type approximation to the posterior distribution through an accept/reject mechanism which compares summary statistics of real and simulated data.
We consider the traditional accept/reject kernel as well as an exponential weighting scheme which does not require the ABC acceptance threshold.
arXiv Detail & Related papers (2021-11-22T20:07:55Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Deep Learning in current Neuroimaging: a multivariate approach with
power and type I error control but arguable generalization ability [0.158310730488265]
A non-parametric framework is proposed that estimates the statistical significance of classifications using deep learning architectures.
A label permutation test is proposed in both studies using cross-validation (CV) and resubstitution with upper bound correction (RUB) as validation methods.
We found in the permutation test that CV and RUB methods offer a false positive rate close to the significance level and an acceptable statistical power.
arXiv Detail & Related papers (2021-03-30T21:15:39Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.