Related papers: A scalable pipeline for COVID-19: the case study of Germany, Czechia and Poland

A scalable pipeline for COVID-19: the case study of Germany, Czechia and Poland

URL: http://arxiv.org/abs/2208.12928v1
Date: Sat, 27 Aug 2022 05:14:01 GMT
Title: A scalable pipeline for COVID-19: the case study of Germany, Czechia and Poland
Authors: Wildan Abdussalam, Adam Mertel, Kai Fan, Lennart Sch\"uler and Weronika Schlechte-We{\l}nicz and Justin M. Calabrese
Abstract summary: We have built an operational data store (ODS) using to consolidate datasets from multiple data sources. The ODS has been built not only to store COVID-19 data from Germany, Czechia, and Poland but also other areas. The data can then support not only forecasting using a version-controlled ArimaHolt model and other analyses to support decision making, but also risk calculator and apps.
Score: 7.753854979677439
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Throughout the coronavirus disease 2019 (COVID-19) pandemic, decision makers have relied on forecasting models to determine and implement non-pharmaceutical interventions (NPI). In building the forecasting models, continuously updated datasets from various stakeholders including developers, analysts, and testers are required to provide precise predictions. Here we report the design of a scalable pipeline which serves as a data synchronization to support inter-country top-down spatiotemporal observations and forecasting models of COVID-19, named the where2test, for Germany, Czechia and Poland. We have built an operational data store (ODS) using PostgreSQL to continuously consolidate datasets from multiple data sources, perform collaborative work, facilitate high performance data analysis, and trace changes. The ODS has been built not only to store the COVID-19 data from Germany, Czechia, and Poland but also other areas. Employing the dimensional fact model, a schema of metadata is capable of synchronizing the various structures of data from those regions, and is scalable to the entire world. Next, the ODS is populated using batch Extract, Transfer, and Load (ETL) jobs. The SQL queries are subsequently created to reduce the need for pre-processing data for users. The data can then support not only forecasting using a version-controlled Arima-Holt model and other analyses to support decision making, but also risk calculator and optimisation apps. The data synchronization runs at a daily interval, which is displayed at https://www.where2test.de.

Related papers

Unified Human Localization and Trajectory Prediction with Monocular Vision [64.19384064365431]
MonoTransmotion is a Transformer-based framework that uses only a monocular camera to jointly solve localization and prediction tasks. We show that by jointly training both tasks with our unified framework, our method is more robust in real-world scenarios made of noisy inputs.
arXiv Detail & Related papers (2025-03-05T14:18:39Z)
KODA: A Data-Driven Recursive Model for Time Series Forecasting and Data Assimilation using Koopman Operators [14.429071321401953]
We propose a Koopman operator-based approach that integrates forecasting and data assimilation in nonlinear dynamical systems. In particular we use a Fourier domain filter to disentangle the data into a physical component whose dynamics can be accurately represented by a Koopman operator. We show that KODA outperforms existing state of the art methods on multiple time series benchmarks.
arXiv Detail & Related papers (2024-09-29T02:25:48Z)
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore [85.4310806466002]
We find that increasing the size of the datastore used by a retrieval-based LM monotonically improves language modeling and several downstream tasks without obvious saturation. By plotting compute-optimal scaling curves with varied datastore, model, and pretraining data sizes, we show that using larger datastores can significantly improve model performance for the same training compute budget.
arXiv Detail & Related papers (2024-07-09T08:27:27Z)
UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction [93.77809355002591]
We introduce UniTraj, a comprehensive framework that unifies various datasets, models, and evaluation criteria. We conduct extensive experiments and find that model performance significantly drops when transferred to other datasets. We provide insights into dataset characteristics to explain these findings.
arXiv Detail & Related papers (2024-03-22T10:36:50Z)
SCTc-TE: A Comprehensive Formulation and Benchmark for Temporal Event Forecasting [63.01035584154509]
We develop a fully automated pipeline and construct a large-scale dataset named MidEast-TE from about 0.6 million news articles. This dataset focuses on the cooperation and conflict events among countries mainly in the MidEast region from 2015 to 2022. We propose a novel method LoGo that is able to take advantage of both Local and Global contexts for SCTc-TE forecasting.
arXiv Detail & Related papers (2023-12-02T07:40:21Z)
Deep COVID-19 Forecasting for Multiple States with Data Augmentation [10.197800697048903]
We propose a deep learning approach to forecasting state-level COVID-19 trends of weekly cumulative death in the United States (US) and incident cases in Germany. This approach includes a transformer model, an ensemble method, and a data augmentation technique for time series. Our model has achieved some of the best state-level results on the COVID-19 Forecast Hub for the US and for Germany.
arXiv Detail & Related papers (2023-02-02T15:16:13Z)
Strict baselines for Covid-19 forecasting and ML perspective for USA and Russia [105.54048699217668]
Covid-19 allows researchers to gather datasets accumulated over 2 years and to use them in predictive analysis. We present the results of a consistent comparative study of different types of methods for predicting the dynamics of the spread of Covid-19 based on regional data for two countries: the United States and Russia.
arXiv Detail & Related papers (2022-07-15T18:21:36Z)
Exploration of an End-to-End Automatic Number-plate Recognition neural network for Indian datasets [0.0]
We release an expanding dataset presently consisting of 1.5k images and a scalable and reproducible procedure of enhancing this dataset towards development of ANPR solution for Indian conditions. We report the hindrances in direct reusability of the model provided by the authors of CCPD because of the extreme diversity in Indian number plates and differences in distribution with respect to the CCPD dataset. An improvement of 42.86% was observed in LP detection after aligning the characteristics of Indian dataset with Chinese dataset.
arXiv Detail & Related papers (2022-07-14T05:05:18Z)
Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models. We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z)
Towards physically consistent data-driven weather forecasting: Integrating data assimilation with equivariance-preserving deep spatial transformers [2.7998963147546148]
We propose 3 components to integrate with commonly used data-driven weather prediction models. These components are 1) a deep spatial transformer added to latent space of U-NETs to preserve equivariance, 2) a data-assimilation algorithm to ingest noisy observations and improve the initial conditions for next forecasts, and 3) a multi-time-step algorithm, improving the accuracy of forecasts at short intervals.
arXiv Detail & Related papers (2021-03-16T23:15:00Z)
Interactive exploration of population scale pharmacoepidemiology datasets [0.0]
Population-scale drug prescription data linked with adverse drug reaction (ADR) supports the fitting of models large enough to detect drug use and ADR patterns. detecting ADR patterns in large datasets requires tools for scalable data processing, machine learning for data analysis, and interactive visualization. We have created a tool for interactive exploration of patterns in prescription datasets with millions of samples.
arXiv Detail & Related papers (2020-05-20T07:34:50Z)
Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters. We infer the posteriors over such latent variables based on data from seen task-language combinations. Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.