Related papers: Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Estimating Sample Size and Reducing Overfitting

Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Estimating Sample Size and Reducing Overfitting

URL: http://arxiv.org/abs/2308.11197v3
Date: Fri, 22 Dec 2023 17:14:38 GMT
Title: Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Estimating Sample Size and Reducing Overfitting
Authors: Hamzeh Ghasemzadeh, Robert E. Hillman, Daryush D. Mehta
Abstract summary: This study uses Monte Carlo simulations to quantify the interactions between the employed cross-validation method and the discnative power of features. The required sample size with a single holdout could be 50% higher than what would be needed if nested crossvalidation were used.
Score: 1.8416014644193064
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This study's first purpose is to provide quantitative evidence that would incentivize researchers to instead use the more robust method of nested cross-validation. The second purpose is to present methods and MATLAB codes for doing power analysis for ML-based analysis during the design of a study. Monte Carlo simulations were used to quantify the interactions between the employed cross-validation method, the discriminative power of features, the dimensionality of the feature space, and the dimensionality of the model. Four different cross-validations (single holdout, 10-fold, train-validation-test, and nested 10-fold) were compared based on the statistical power and statistical confidence of the ML models. Distributions of the null and alternative hypotheses were used to determine the minimum required sample size for obtaining a statistically significant outcome ({\alpha}=0.05, 1-\b{eta}=0.8). Statistical confidence of the model was defined as the probability of correct features being selected and hence being included in the final model. Our analysis showed that the model generated based on the single holdout method had very low statistical power and statistical confidence and that it significantly overestimated the accuracy. Conversely, the nested 10-fold cross-validation resulted in the highest statistical confidence and the highest statistical power, while providing an unbiased estimate of the accuracy. The required sample size with a single holdout could be 50% higher than what would be needed if nested cross-validation were used. Confidence in the model based on nested cross-validation was as much as four times higher than the confidence in the single holdout-based model. A computational model, MATLAB codes, and lookup tables are provided to assist researchers with estimating the sample size during the design of their future studies.

Related papers

Pre-validation Revisited [79.92204034170092]
We show properties and benefits of pre-validation in prediction, inference and error estimation by simulations and applications.<n>We propose not only an analytical distribution of the test statistic for the pre-validated predictor under certain models, but also a generic bootstrap procedure to conduct inference.
arXiv Detail & Related papers (2025-05-21T00:20:14Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective. The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning. The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
Predictive Performance Test based on the Exhaustive Nested Cross-Validation for High-dimensional data [7.62566998854384]
Cross-validation is used for several tasks such as estimating the prediction error, tuning the regularization parameter, and selecting the most suitable predictive model. The K-fold cross-validation is a popular CV method but its limitation is that the risk estimates are highly dependent on the partitioning of the data. This study presents an alternative novel predictive performance test and valid confidence intervals based on exhaustive nested cross-validation.
arXiv Detail & Related papers (2024-08-06T12:28:16Z)
Modelling Sampling Distributions of Test Statistics with Autograd [0.0]
We explore whether this approach to modeling conditional 1-dimensional sampling distributions is a viable alternative to the probability density-ratio method. Relatively simple, yet effective, neural network models are used whose predictive uncertainty is quantified through a variety of methods.
arXiv Detail & Related papers (2024-05-03T21:34:12Z)
Bootstrapping the Cross-Validation Estimate [3.5159221757909656]
Cross-validation is a widely used technique for evaluating the performance of prediction models. It is essential to accurately quantify the uncertainty associated with the estimate. This paper proposes a fast bootstrap method that quickly estimates the standard error of the cross-validation estimate.
arXiv Detail & Related papers (2023-07-01T07:50:54Z)
Model-agnostic out-of-distribution detection using combined statistical tests [15.27980070479021]
We present simple methods for out-of-distribution detection using a trained generative model. We combine a classical parametric test (Rao's score test) with the recently introduced typicality test. Despite their simplicity and generality, these methods can be competitive with model-specific out-of-distribution detection algorithms.
arXiv Detail & Related papers (2022-03-02T13:32:09Z)
Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next. In such settings, there is a distinct type of distribution shift between the training and test data. We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z)
Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model. We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z)
On Statistical Efficiency in Learning [37.08000833961712]
We address the challenge of model selection to strike a balance between model fitting and model complexity. We propose an online algorithm that sequentially expands the model complexity to enhance selection stability and reduce cost. Experimental studies show that the proposed method has desirable predictive power and significantly less computational cost than some popular methods.
arXiv Detail & Related papers (2020-12-24T16:08:29Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
Density of States Estimation for Out-of-Distribution Detection [69.90130863160384]
DoSE is the density of states estimator. We demonstrate DoSE's state-of-the-art performance against other unsupervised OOD detectors.
arXiv Detail & Related papers (2020-06-16T16:06:25Z)
Machine learning for causal inference: on the use of cross-fit estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties. We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE) When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.