Curse of Small Sample Size in Forecasting of the Active Cases in
COVID-19 Outbreak
- URL: http://arxiv.org/abs/2011.03628v2
- Date: Thu, 26 Nov 2020 09:23:51 GMT
- Title: Curse of Small Sample Size in Forecasting of the Active Cases in
COVID-19 Outbreak
- Authors: Mert Nak{\i}p, Onur \c{C}opur, C\"uneyt G\"uzeli\c{s}
- Abstract summary: During the COVID-19 pandemic, a massive number of attempts on the predictions of the number of cases and the other future trends of this pandemic have been made.
However, they fail to predict, in a reliable way, the medium and long term evolution of fundamental features of COVID-19 outbreak within acceptable accuracy.
This paper gives an explanation for the failure of machine learning models in this particular forecasting problem.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: During the COVID-19 pandemic, a massive number of attempts on the predictions
of the number of cases and the other future trends of this pandemic have been
made. However, they fail to predict, in a reliable way, the medium and long
term evolution of fundamental features of COVID-19 outbreak within acceptable
accuracy. This paper gives an explanation for the failure of machine learning
models in this particular forecasting problem. The paper shows that simple
linear regression models provide high prediction accuracy values reliably but
only for a 2-weeks period and that relatively complex machine learning models,
which have the potential of learning long term predictions with low errors,
cannot achieve to obtain good predictions with possessing a high generalization
ability. It is suggested in the paper that the lack of a sufficient number of
samples is the source of low prediction performance of the forecasting models.
The reliability of the forecasting results about the active cases is measured
in terms of the cross-validation prediction errors, which are used as
expectations for the generalization errors of the forecasters. To exploit the
information, which is of most relevant with the active cases, we perform
feature selection over a variety of variables. We apply different feature
selection methods, namely the Pairwise Correlation, Recursive Feature
Selection, and feature selection by using the Lasso regression and compare them
to each other and also with the models not employing any feature selection.
Furthermore, we compare Linear Regression, Multi-Layer Perceptron, and
Long-Short Term Memory models each of which is used for prediction active cases
together with the mentioned feature selection methods. Our results show that
the accurate forecasting of the active cases with high generalization ability
is possible up to 3 days only because of the small sample size of COVID-19
data.
Related papers
- An Experimental Study on the Rashomon Effect of Balancing Methods in Imbalanced Classification [0.0]
This paper examines the impact of balancing methods on predictive multiplicity using the Rashomon effect.
It is crucial because the blind model selection in data-centric AI is risky from a set of approximately equally accurate models.
arXiv Detail & Related papers (2024-03-22T13:08:22Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Random features models: a way to study the success of naive imputation [0.0]
Constant (naive) imputation is still widely used in practice as this is a first easy-to-use technique to deal with missing data.
Recent works suggest that this bias is low in the context of high-dimensional linear predictors.
This paper confirms the intuition that the bias is negligible and that surprisingly naive imputation also remains relevant in very low dimension.
arXiv Detail & Related papers (2024-02-06T09:37:06Z) - Learning Sample Difficulty from Pre-trained Models for Reliable
Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization.
We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Robust self-healing prediction model for high dimensional data [0.685316573653194]
This work proposes a robust self healing (RSH) hybrid prediction model.
It functions by using the data in its entirety by removing errors and inconsistencies from it rather than discarding any data.
The proposed method is compared with some of the existing high performing models and the results are analyzed.
arXiv Detail & Related papers (2022-10-04T17:55:50Z) - Predictive Multiplicity in Probabilistic Classification [25.111463701666864]
We present a framework for measuring predictive multiplicity in probabilistic classification.
We demonstrate the incidence and prevalence of predictive multiplicity in real-world tasks.
Our results emphasize the need to report predictive multiplicity more widely.
arXiv Detail & Related papers (2022-06-02T16:25:29Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - Back2Future: Leveraging Backfill Dynamics for Improving Real-time
Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task.
'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature.
We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.