A scalable pipeline for COVID-19: the case study of Germany, Czechia and
Poland
- URL: http://arxiv.org/abs/2208.12928v1
- Date: Sat, 27 Aug 2022 05:14:01 GMT
- Title: A scalable pipeline for COVID-19: the case study of Germany, Czechia and
Poland
- Authors: Wildan Abdussalam, Adam Mertel, Kai Fan, Lennart Sch\"uler and
Weronika Schlechte-We{\l}nicz and Justin M. Calabrese
- Abstract summary: We have built an operational data store (ODS) using to consolidate datasets from multiple data sources.
The ODS has been built not only to store COVID-19 data from Germany, Czechia, and Poland but also other areas.
The data can then support not only forecasting using a version-controlled ArimaHolt model and other analyses to support decision making, but also risk calculator and apps.
- Score: 7.753854979677439
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Throughout the coronavirus disease 2019 (COVID-19) pandemic, decision makers
have relied on forecasting models to determine and implement non-pharmaceutical
interventions (NPI). In building the forecasting models, continuously updated
datasets from various stakeholders including developers, analysts, and testers
are required to provide precise predictions. Here we report the design of a
scalable pipeline which serves as a data synchronization to support
inter-country top-down spatiotemporal observations and forecasting models of
COVID-19, named the where2test, for Germany, Czechia and Poland. We have built
an operational data store (ODS) using PostgreSQL to continuously consolidate
datasets from multiple data sources, perform collaborative work, facilitate
high performance data analysis, and trace changes. The ODS has been built not
only to store the COVID-19 data from Germany, Czechia, and Poland but also
other areas. Employing the dimensional fact model, a schema of metadata is
capable of synchronizing the various structures of data from those regions, and
is scalable to the entire world. Next, the ODS is populated using batch
Extract, Transfer, and Load (ETL) jobs. The SQL queries are subsequently
created to reduce the need for pre-processing data for users. The data can then
support not only forecasting using a version-controlled Arima-Holt model and
other analyses to support decision making, but also risk calculator and
optimisation apps. The data synchronization runs at a daily interval, which is
displayed at https://www.where2test.de.
Related papers
- KODA: A Data-Driven Recursive Model for Time Series Forecasting and Data Assimilation using Koopman Operators [14.429071321401953]
We propose a Koopman operator-based approach that integrates forecasting and data assimilation in nonlinear dynamical systems.
In particular we use a Fourier domain filter to disentangle the data into a physical component whose dynamics can be accurately represented by a Koopman operator.
We show that KODA outperforms existing state of the art methods on multiple time series benchmarks.
arXiv Detail & Related papers (2024-09-29T02:25:48Z) - Scaling Retrieval-Based Language Models with a Trillion-Token Datastore [85.4310806466002]
We find that increasing the size of the datastore used by a retrieval-based LM monotonically improves language modeling and several downstream tasks without obvious saturation.
By plotting compute-optimal scaling curves with varied datastore, model, and pretraining data sizes, we show that using larger datastores can significantly improve model performance for the same training compute budget.
arXiv Detail & Related papers (2024-07-09T08:27:27Z) - UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction [93.77809355002591]
We introduce UniTraj, a comprehensive framework that unifies various datasets, models, and evaluation criteria.
We conduct extensive experiments and find that model performance significantly drops when transferred to other datasets.
We provide insights into dataset characteristics to explain these findings.
arXiv Detail & Related papers (2024-03-22T10:36:50Z) - SCTc-TE: A Comprehensive Formulation and Benchmark for Temporal Event Forecasting [63.01035584154509]
We develop a fully automated pipeline and construct a large-scale dataset named MidEast-TE from about 0.6 million news articles.
This dataset focuses on the cooperation and conflict events among countries mainly in the MidEast region from 2015 to 2022.
We propose a novel method LoGo that is able to take advantage of both Local and Global contexts for SCTc-TE forecasting.
arXiv Detail & Related papers (2023-12-02T07:40:21Z) - Deep COVID-19 Forecasting for Multiple States with Data Augmentation [10.197800697048903]
We propose a deep learning approach to forecasting state-level COVID-19 trends of weekly cumulative death in the United States (US) and incident cases in Germany.
This approach includes a transformer model, an ensemble method, and a data augmentation technique for time series.
Our model has achieved some of the best state-level results on the COVID-19 Forecast Hub for the US and for Germany.
arXiv Detail & Related papers (2023-02-02T15:16:13Z) - Strict baselines for Covid-19 forecasting and ML perspective for USA and
Russia [105.54048699217668]
Covid-19 allows researchers to gather datasets accumulated over 2 years and to use them in predictive analysis.
We present the results of a consistent comparative study of different types of methods for predicting the dynamics of the spread of Covid-19 based on regional data for two countries: the United States and Russia.
arXiv Detail & Related papers (2022-07-15T18:21:36Z) - Exploration of an End-to-End Automatic Number-plate Recognition neural
network for Indian datasets [0.0]
We release an expanding dataset presently consisting of 1.5k images and a scalable and reproducible procedure of enhancing this dataset towards development of ANPR solution for Indian conditions.
We report the hindrances in direct reusability of the model provided by the authors of CCPD because of the extreme diversity in Indian number plates and differences in distribution with respect to the CCPD dataset.
An improvement of 42.86% was observed in LP detection after aligning the characteristics of Indian dataset with Chinese dataset.
arXiv Detail & Related papers (2022-07-14T05:05:18Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Towards physically consistent data-driven weather forecasting:
Integrating data assimilation with equivariance-preserving deep spatial
transformers [2.7998963147546148]
We propose 3 components to integrate with commonly used data-driven weather prediction models.
These components are 1) a deep spatial transformer added to latent space of U-NETs to preserve equivariance, 2) a data-assimilation algorithm to ingest noisy observations and improve the initial conditions for next forecasts, and 3) a multi-time-step algorithm, improving the accuracy of forecasts at short intervals.
arXiv Detail & Related papers (2021-03-16T23:15:00Z) - Interactive exploration of population scale pharmacoepidemiology
datasets [0.0]
Population-scale drug prescription data linked with adverse drug reaction (ADR) supports the fitting of models large enough to detect drug use and ADR patterns.
detecting ADR patterns in large datasets requires tools for scalable data processing, machine learning for data analysis, and interactive visualization.
We have created a tool for interactive exploration of patterns in prescription datasets with millions of samples.
arXiv Detail & Related papers (2020-05-20T07:34:50Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.