Forecasting COVID-19 spreading trough an ensemble of classical and
machine learning models: Spain's case study
- URL: http://arxiv.org/abs/2207.05753v1
- Date: Tue, 12 Jul 2022 08:16:44 GMT
- Title: Forecasting COVID-19 spreading trough an ensemble of classical and
machine learning models: Spain's case study
- Authors: Ignacio Heredia Cacha, Judith Sainz-Pardo D\'iaz, Mar\'ia Castrillo
Melguizo, \'Alvaro L\'opez Garc\'ia
- Abstract summary: We evaluate the applicability of an ensemble of population models and machine learning models to predict the near future evolution of the COVID-19 pandemic.
We rely solely in open and public datasets, fusing incidence, vaccination, human mobility and weather data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this work we evaluate the applicability of an ensemble of population
models and machine learning models to predict the near future evolution of the
COVID-19 pandemic, with a particular use case in Spain. We rely solely in open
and public datasets, fusing incidence, vaccination, human mobility and weather
data to feed our machine learning models (Random Forest, Gradient Boosting,
k-Nearest Neighbours and Kernel Ridge Regression). We use the incidence data to
adjust classic population models (Gompertz, Logistic, Richards, Bertalanffy) in
order to be able to better capture the trend of the data. We then ensemble
these two families of models in order to obtain a more robust and accurate
prediction. Furthermore, we have observed an improvement in the predictions
obtained with machine learning models as we add new features (vaccines,
mobility, climatic conditions), analyzing the importance of each of them using
Shapley Additive Explanation values. As in any other modelling work, data and
predictions quality have several limitations and therefore they must be seen
from a critical standpoint, as we discuss in the text. Our work concludes that
the ensemble use of these models improves the individual predictions (using
only machine learning models or only population models) and can be applied,
with caution, in cases when compartmental models cannot be utilized due to the
lack of relevant data.
Related papers
- On conditional diffusion models for PDE simulations [53.01911265639582]
We study score-based diffusion models for forecasting and assimilation of sparse observations.
We propose an autoregressive sampling approach that significantly improves performance in forecasting.
We also propose a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths.
arXiv Detail & Related papers (2024-10-21T18:31:04Z) - Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning.
By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - EAMDrift: An interpretable self retrain model for time series [0.0]
We present EAMDrift, a novel method that combines forecasts from multiple individual predictors by weighting each prediction according to a performance metric.
EAMDrift is designed to automatically adapt to out-of-distribution patterns in data and identify the most appropriate models to use at each moment.
Our study on real-world datasets shows that EAMDrift outperforms individual baseline models by 20% and achieves comparable accuracy results to non-interpretable ensemble models.
arXiv Detail & Related papers (2023-05-31T13:25:26Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - A fairness assessment of mobility-based COVID-19 case prediction models [0.0]
We tested the hypothesis that bias in the mobility data used to train the predictive models might lead to unfairly less accurate predictions for certain demographic groups.
Specifically, the models tend to favor large, highly educated, wealthy young, urban, and non-black-dominated counties.
arXiv Detail & Related papers (2022-10-08T03:43:51Z) - Unifying Epidemic Models with Mixtures [28.771032745045428]
The COVID-19 pandemic has emphasized the need for a robust understanding of epidemic models.
Here, we introduce a simple mixture-based model which bridges the two approaches.
Although the model is non-mechanistic, we show that it arises as the natural outcome of a process based on a networked SIR framework.
arXiv Detail & Related papers (2022-01-07T19:42:05Z) - Back2Future: Leveraging Backfill Dynamics for Improving Real-time
Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task.
'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature.
We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.