Toward Generalizable Machine Learning Models in Speech, Language, and
Hearing Sciences: Estimating Sample Size and Reducing Overfitting
- URL: http://arxiv.org/abs/2308.11197v3
- Date: Fri, 22 Dec 2023 17:14:38 GMT
- Title: Toward Generalizable Machine Learning Models in Speech, Language, and
Hearing Sciences: Estimating Sample Size and Reducing Overfitting
- Authors: Hamzeh Ghasemzadeh, Robert E. Hillman, Daryush D. Mehta
- Abstract summary: This study uses Monte Carlo simulations to quantify the interactions between the employed cross-validation method and the discnative power of features.
The required sample size with a single holdout could be 50% higher than what would be needed if nested crossvalidation were used.
- Score: 1.8416014644193064
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This study's first purpose is to provide quantitative evidence that would
incentivize researchers to instead use the more robust method of nested
cross-validation. The second purpose is to present methods and MATLAB codes for
doing power analysis for ML-based analysis during the design of a study. Monte
Carlo simulations were used to quantify the interactions between the employed
cross-validation method, the discriminative power of features, the
dimensionality of the feature space, and the dimensionality of the model. Four
different cross-validations (single holdout, 10-fold, train-validation-test,
and nested 10-fold) were compared based on the statistical power and
statistical confidence of the ML models. Distributions of the null and
alternative hypotheses were used to determine the minimum required sample size
for obtaining a statistically significant outcome ({\alpha}=0.05,
1-\b{eta}=0.8). Statistical confidence of the model was defined as the
probability of correct features being selected and hence being included in the
final model. Our analysis showed that the model generated based on the single
holdout method had very low statistical power and statistical confidence and
that it significantly overestimated the accuracy. Conversely, the nested
10-fold cross-validation resulted in the highest statistical confidence and the
highest statistical power, while providing an unbiased estimate of the
accuracy. The required sample size with a single holdout could be 50% higher
than what would be needed if nested cross-validation were used. Confidence in
the model based on nested cross-validation was as much as four times higher
than the confidence in the single holdout-based model. A computational model,
MATLAB codes, and lookup tables are provided to assist researchers with
estimating the sample size during the design of their future studies.
Related papers
- Predictive Performance Test based on the Exhaustive Nested Cross-Validation for High-dimensional data [7.62566998854384]
Cross-validation is used for several tasks such as estimating the prediction error, tuning the regularization parameter, and selecting the most suitable predictive model.
The K-fold cross-validation is a popular CV method but its limitation is that the risk estimates are highly dependent on the partitioning of the data.
This study presents an alternative novel predictive performance test and valid confidence intervals based on exhaustive nested cross-validation.
arXiv Detail & Related papers (2024-08-06T12:28:16Z) - Modelling Sampling Distributions of Test Statistics with Autograd [0.0]
We explore whether this approach to modeling conditional 1-dimensional sampling distributions is a viable alternative to the probability density-ratio method.
Relatively simple, yet effective, neural network models are used whose predictive uncertainty is quantified through a variety of methods.
arXiv Detail & Related papers (2024-05-03T21:34:12Z) - Bootstrapping the Cross-Validation Estimate [3.5159221757909656]
Cross-validation is a widely used technique for evaluating the performance of prediction models.
It is essential to accurately quantify the uncertainty associated with the estimate.
This paper proposes a fast bootstrap method that quickly estimates the standard error of the cross-validation estimate.
arXiv Detail & Related papers (2023-07-01T07:50:54Z) - Model-agnostic out-of-distribution detection using combined statistical
tests [15.27980070479021]
We present simple methods for out-of-distribution detection using a trained generative model.
We combine a classical parametric test (Rao's score test) with the recently introduced typicality test.
Despite their simplicity and generality, these methods can be competitive with model-specific out-of-distribution detection algorithms.
arXiv Detail & Related papers (2022-03-02T13:32:09Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - On Statistical Efficiency in Learning [37.08000833961712]
We address the challenge of model selection to strike a balance between model fitting and model complexity.
We propose an online algorithm that sequentially expands the model complexity to enhance selection stability and reduce cost.
Experimental studies show that the proposed method has desirable predictive power and significantly less computational cost than some popular methods.
arXiv Detail & Related papers (2020-12-24T16:08:29Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Density of States Estimation for Out-of-Distribution Detection [69.90130863160384]
DoSE is the density of states estimator.
We demonstrate DoSE's state-of-the-art performance against other unsupervised OOD detectors.
arXiv Detail & Related papers (2020-06-16T16:06:25Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.