Modeling and Forecasting COVID-19 Cases using Latent Subpopulations
- URL: http://arxiv.org/abs/2302.04829v1
- Date: Thu, 9 Feb 2023 18:33:41 GMT
- Title: Modeling and Forecasting COVID-19 Cases using Latent Subpopulations
- Authors: Roberto Vega, Zehra Shah, Pouria Ramazi, Russell Greiner
- Abstract summary: We propose two new methods to model the number of people infected with COVID-19 over time.
Method #1 is a dictionary-based approach, which begins with a large number of pre-defined sub-population models.
Method #2 is a mixture-of-$M$ fittable curves, where $M$, the number of sub-populations to use, is given by the user.
- Score: 8.69240208462227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classical epidemiological models assume homogeneous populations. There have
been important extensions to model heterogeneous populations, when the identity
of the sub-populations is known, such as age group or geographical location.
Here, we propose two new methods to model the number of people infected with
COVID-19 over time, each as a linear combination of latent sub-populations --
i.e., when we do not know which person is in which sub-population, and the only
available observations are the aggregates across all sub-populations. Method #1
is a dictionary-based approach, which begins with a large number of pre-defined
sub-population models (each with its own starting time, shape, etc), then
determines the (positive) weight of small (learned) number of sub-populations.
Method #2 is a mixture-of-$M$ fittable curves, where $M$, the number of
sub-populations to use, is given by the user. Both methods are compatible with
any parametric model; here we demonstrate their use with first (a)~Gaussian
curves and then (b)~SIR trajectories. We empirically show the performance of
the proposed methods, first in (i) modeling the observed data and then in (ii)
forecasting the number of infected people 1 to 4 weeks in advance. Across 187
countries, we show that the dictionary approach had the lowest mean absolute
percentage error and also the lowest variance when compared with classical SIR
models and moreover, it was a strong baseline that outperforms many of the
models developed for COVID-19 forecasting.
Related papers
- Universality in Transfer Learning for Linear Models [18.427215139020625]
We study the problem of transfer learning in linear models for both regression and binary classification.
We provide an exact and rigorous analysis and relate generalization errors (in regression) and classification errors (in binary classification) for the pretrained and fine-tuned models.
arXiv Detail & Related papers (2024-10-03T03:09:09Z) - Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold [83.18058549195855]
We argue that multiple processes in natural sciences have to be represented as vector fields on the Wasserstein manifold of probability densities.
In particular, this is crucial for personalized medicine where the development of diseases and their respective treatment response depends on the microenvironment of cells specific to each patient.
We propose Meta Flow Matching (MFM), a practical approach to integrating along these vector fields on the Wasserstein manifold by amortizing the flow model over the initial populations.
arXiv Detail & Related papers (2024-08-26T20:05:31Z) - Modeling, Inference, and Prediction in Mobility-Based Compartmental Models for Epidemiology [5.079807662054658]
We introduce individual mobility as a key factor in disease transmission and control.
We characterize disease dynamics using mobility distribution functions for each compartment.
We infer mobility distributions from the time series of the infected population.
arXiv Detail & Related papers (2024-06-17T18:13:57Z) - Heterogeneous Peer Effects in the Linear Threshold Model [13.452510519858995]
The Linear Threshold Model describes how information diffuses through a social network.
We propose causal inference methods for estimating individual thresholds that can more accurately predict whether and when individuals will be affected by their peers.
Our experimental results on synthetic and real-world datasets show that our proposed models can better predict individual-level thresholds in the Linear Threshold Model.
arXiv Detail & Related papers (2022-01-27T00:23:26Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - Loss function based second-order Jensen inequality and its application
to particle variational inference [112.58907653042317]
Particle variational inference (PVI) uses an ensemble of models as an empirical approximation for the posterior distribution.
PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models.
We derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models.
arXiv Detail & Related papers (2021-06-09T12:13:51Z) - Model-based metrics: Sample-efficient estimates of predictive model
subpopulation performance [11.994417027132807]
Machine learning models $-$ now commonly developed to screen, diagnose, or predict health conditions are evaluated with a variety of performance metrics.
Subpopulation performance metrics are typically computed using only data from that subgroup, resulting in higher variance estimates for smaller groups.
We propose using an evaluation model $-$ a model that describes the conditional distribution of the predictive model score $-$ to form model-based metric (MBM) estimates.
arXiv Detail & Related papers (2021-04-25T19:06:34Z) - STELAR: Spatio-temporal Tensor Factorization with Latent Epidemiological
Regularization [76.57716281104938]
We develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously.
STELAR enables long-term prediction by incorporating latent temporal regularization through a system of discrete-time difference equations.
We conduct experiments using both county- and state-level COVID-19 data and show that our model can identify interesting latent patterns of the epidemic.
arXiv Detail & Related papers (2020-12-08T21:21:47Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z) - Survival Cluster Analysis [93.50540270973927]
There is an unmet need in survival analysis for identifying subpopulations with distinct risk profiles.
An approach that addresses this need is likely to improve characterization of individual outcomes.
arXiv Detail & Related papers (2020-02-29T22:41:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.