Unraveling overoptimism and publication bias in ML-driven science
- URL: http://arxiv.org/abs/2405.14422v3
- Date: Thu, 11 Jul 2024 19:40:20 GMT
- Title: Unraveling overoptimism and publication bias in ML-driven science
- Authors: Pouria Saidi, Gautam Dasarathy, Visar Berisha,
- Abstract summary: Recent studies suggest published performance of Machine Learning models are often overoptimistic.
We introduce a novel model for observed accuracy, integrating parametric learning curves and the aforementioned biases.
Applying the model to meta-analyses of classifications of neurological conditions, we estimate the inherent limits of ML-based prediction in each domain.
- Score: 14.38643099447636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine Learning (ML) is increasingly used across many disciplines with impressive reported results. However, recent studies suggest published performance of ML models are often overoptimistic. Validity concerns are underscored by findings of an inverse relationship between sample size and reported accuracy in published ML models, contrasting with the theory of learning curves where accuracy should improve or remain stable with increasing sample size. This paper investigates factors contributing to overoptimism in ML-driven science, focusing on overfitting and publication bias. We introduce a novel stochastic model for observed accuracy, integrating parametric learning curves and the aforementioned biases. We construct an estimator that corrects for these biases in observed data. Theoretical and empirical results show that our framework can estimate the underlying learning curve, providing realistic performance assessments from published results. Applying the model to meta-analyses of classifications of neurological conditions, we estimate the inherent limits of ML-based prediction in each domain.
Related papers
- A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models [63.949883238901414]
We present a unique angle of gradient analysis of loss functions that simultaneously reward good examples and penalize bad ones in LMs.
We find that ExMATE serves as a superior surrogate for MLE, and that combining DPO with ExMATE instead of MLE further enhances both the statistical (5-7%) and generative (+18% win rate) performance.
arXiv Detail & Related papers (2024-08-29T17:46:18Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Variance of ML-based software fault predictors: are we really improving
fault prediction? [0.3222802562733786]
We experimentally analyze the variance of a state-of-the-art fault prediction approach.
We observed a maximum variance of 10.10% in terms of the per-class accuracy metric.
arXiv Detail & Related papers (2023-10-26T09:31:32Z) - Hyperparameter Tuning and Model Evaluation in Causal Effect Estimation [2.7823528791601686]
This paper investigates the interplay between the four different aspects of model evaluation for causal effect estimation.
We find that most causal estimators are roughly equivalent in performance if tuned thoroughly enough.
We call for more research into causal model evaluation to unlock the optimum performance not currently being delivered even by state-of-the-art procedures.
arXiv Detail & Related papers (2023-03-02T17:03:02Z) - Expected Validation Performance and Estimation of a Random Variable's
Maximum [48.83713377993604]
We analyze three statistical estimators for expected validation performance.
We find the unbiased estimator has the highest variance, and the estimator with the smallest variance has the largest bias.
We find that the two biased estimators lead to the fewest incorrect conclusions.
arXiv Detail & Related papers (2021-10-01T18:48:47Z) - A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of
Overparameterized Machine Learning [37.01683478234978]
The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field.
One of the most important riddles is the good empirical generalization of over parameterized models.
arXiv Detail & Related papers (2021-09-06T10:48:40Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z) - Efficient Debiased Evidence Estimation by Multilevel Monte Carlo
Sampling [0.0]
We propose a new optimization algorithm for Bayesian inference based multilevel Monte Carlo (MLMC) methods.
Our numerical results confirm considerable computational savings compared to the conventional estimators.
arXiv Detail & Related papers (2020-01-14T09:14:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.