Forecasting COVID-19 Caseloads Using Unsupervised Embedding Clusters of
Social Media Posts
- URL: http://arxiv.org/abs/2205.10408v1
- Date: Fri, 20 May 2022 18:59:04 GMT
- Title: Forecasting COVID-19 Caseloads Using Unsupervised Embedding Clusters of
Social Media Posts
- Authors: Felix Drinkall, Stefan Zohren and Janet B. Pierrehumbert
- Abstract summary: We present a novel approach incorporating transformer-based language models into infectious disease modelling.
We benchmark these clustered embedding features against features extracted from other high-quality datasets.
In a threshold-classification task, we show that they outperform all other feature types at predicting upward trend signals.
- Score: 14.201816626446885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel approach incorporating transformer-based language models
into infectious disease modelling. Text-derived features are quantified by
tracking high-density clusters of sentence-level representations of Reddit
posts within specific US states' COVID-19 subreddits. We benchmark these
clustered embedding features against features extracted from other high-quality
datasets. In a threshold-classification task, we show that they outperform all
other feature types at predicting upward trend signals, a significant result
for infectious disease modelling in areas where epidemiological data is
unreliable. Subsequently, in a time-series forecasting task we fully utilise
the predictive power of the caseload and compare the relative strengths of
using different supplementary datasets as covariate feature sets in a
transformer-based time-series model.
Related papers
- Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - Forecast reconciliation for vaccine supply chain optimization [61.13962963550403]
Vaccine supply chain optimization can benefit from hierarchical time series forecasting.
Forecasts of different hierarchy levels become incoherent when higher levels do not match the sum of the lower levels forecasts.
We tackle the vaccine sale forecasting problem by modeling sales data from GSK between 2010 and 2021 as a hierarchical time series.
arXiv Detail & Related papers (2023-05-02T14:34:34Z) - Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular
data [81.43750358586072]
We propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes.
We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets.
arXiv Detail & Related papers (2022-10-24T08:57:55Z) - VAESim: A probabilistic approach for self-supervised prototype discovery [0.23624125155742057]
We propose an architecture for image stratification based on a conditional variational autoencoder.
We use a continuous latent space to represent the continuum of disorders and find clusters during training, which can then be used for image/patient stratification.
We demonstrate that our method outperforms baselines in terms of kNN accuracy measured on a classification task against a standard VAE.
arXiv Detail & Related papers (2022-09-25T17:55:31Z) - Cancer Subtyping by Improved Transcriptomic Features Using Vector
Quantized Variational Autoencoder [10.835673227875615]
We propose Vector Quantized Variational AutoEncoder (VQ-VAE) to tackle the data issues and extract informative latent features that are crucial to the quality of subsequent clustering.
VQ-VAE does not impose strict assumptions and hence its latent features are better representations of the input, capable of yielding superior clustering performance with any mainstream clustering method.
arXiv Detail & Related papers (2022-07-20T09:47:53Z) - Automatic Pharma News Categorization [0.0]
We use a text dataset consisting of 23 news categories relevant to pharma information science.
We compare the fine-tuning performance of multiple transformer models in a classification task.
We propose an ensemble model consisting of the top performing individual predictors.
arXiv Detail & Related papers (2021-12-28T08:42:16Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - STELAR: Spatio-temporal Tensor Factorization with Latent Epidemiological
Regularization [76.57716281104938]
We develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously.
STELAR enables long-term prediction by incorporating latent temporal regularization through a system of discrete-time difference equations.
We conduct experiments using both county- and state-level COVID-19 data and show that our model can identify interesting latent patterns of the epidemic.
arXiv Detail & Related papers (2020-12-08T21:21:47Z) - Examining Deep Learning Models with Multiple Data Sources for COVID-19
Forecasting [10.052302234274256]
We design and analysis of deep learning-based models for COVID-19 forecasting.
We consider multiple sources such as COVID-19 confirmed and death case count data and testing data for better predictions.
We propose clustering-based training for high-temporal forecasting.
arXiv Detail & Related papers (2020-10-27T17:52:02Z) - DeepCOVIDNet: An Interpretable Deep Learning Model for Predictive
Surveillance of COVID-19 Using Heterogeneous Features and their Interactions [2.30238915794052]
We propose a deep learning model to forecast the range of increase in COVID-19 infected cases in future days.
Using data collected from various sources, we estimate the range of increase in infected cases seven days into the future for all U.S. counties.
arXiv Detail & Related papers (2020-07-31T23:37:38Z) - Temporal Phenotyping using Deep Predictive Clustering of Disease
Progression [97.88605060346455]
We develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest.
Experiments on two real-world datasets show that our model achieves superior clustering performance over state-of-the-art benchmarks.
arXiv Detail & Related papers (2020-06-15T20:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.