Flusion: Integrating multiple data sources for accurate influenza predictions
- URL: http://arxiv.org/abs/2407.19054v1
- Date: Fri, 26 Jul 2024 19:24:02 GMT
- Title: Flusion: Integrating multiple data sources for accurate influenza predictions
- Authors: Evan L. Ray, Yijin Wang, Russell D. Wolfinger, Nicholas G. Reich,
- Abstract summary: The US Centers for Disease Control and Prevention (CDC) has organized an annual influenza forecasting challenge.
Our model, Flusion, is an ensemble that combines gradient boosting quantile regression models with a Bayesian autoregressive model.
Flusion was the top-performing model in the CDC's influenza prediction challenge for the 2023/24 season.
- Score: 0.24999074238880484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the last ten years, the US Centers for Disease Control and Prevention (CDC) has organized an annual influenza forecasting challenge with the motivation that accurate probabilistic forecasts could improve situational awareness and yield more effective public health actions. Starting with the 2021/22 influenza season, the forecasting targets for this challenge have been based on hospital admissions reported in the CDC's National Healthcare Safety Network (NHSN) surveillance system. Reporting of influenza hospital admissions through NHSN began within the last few years, and as such only a limited amount of historical data are available for this signal. To produce forecasts in the presence of limited data for the target surveillance system, we augmented these data with two signals that have a longer historical record: 1) ILI+, which estimates the proportion of outpatient doctor visits where the patient has influenza; and 2) rates of laboratory-confirmed influenza hospitalizations at a selected set of healthcare facilities. Our model, Flusion, is an ensemble that combines gradient boosting quantile regression models with a Bayesian autoregressive model. The gradient boosting models were trained on all three data signals, while the autoregressive model was trained on only the target signal; all models were trained jointly on data for multiple locations. Flusion was the top-performing model in the CDC's influenza prediction challenge for the 2023/24 season. In this article we investigate the factors contributing to Flusion's success, and we find that its strong performance was primarily driven by the use of a gradient boosting model that was trained jointly on data from multiple surveillance signals and locations. These results indicate the value of sharing information across locations and surveillance signals, especially when doing so adds to the pool of available training data.
Related papers
- Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Data-Centric Epidemic Forecasting: A Survey [56.99209141838794]
This survey delves into various data-driven methodological and practical advancements.
We enumerate the large number of epidemiological datasets and novel data streams that are relevant to epidemic forecasting.
We also discuss experiences and challenges that arise in real-world deployment of these forecasting systems.
arXiv Detail & Related papers (2022-07-19T16:15:11Z) - Predicting COVID-19 Spread from Large-Scale Mobility Data [22.55034017418318]
A potential near real-time predictor of future case numbers is human mobility.
We introduce a novel model for epidemic forecasting based on mobility data, called mobility marked Hawkes model.
Our work is the first to predict the spread of COVID-19 from telecommunication data.
arXiv Detail & Related papers (2021-06-01T10:05:02Z) - EventScore: An Automated Real-time Early Warning Score for Clinical
Events [3.3039612529376625]
We build an interpretable model for the early prediction of various adverse clinical events indicative of clinical deterioration.
The model is evaluated on two datasets and four clinical events.
Our model can be entirely automated without requiring any manually recorded features.
arXiv Detail & Related papers (2021-02-11T11:55:08Z) - Deep learning via LSTM models for COVID-19 infection forecasting in
India [13.163271874039191]
Prominent computational and mathematical models have been unreliable due to the complexity of the spread of infections.
Deep learning models such as recurrent neural networks are well suited for modelling temporal sequences.
We select states with COVID-19 hotpots in terms of the rate of infections and compare with states where infections have been contained or reached their peak.
Our results show that long-term forecasts are promising which motivates the application of the method in other countries or areas.
arXiv Detail & Related papers (2021-01-28T09:19:10Z) - Predicting seasonal influenza using supermarket retail records [59.18952050885709]
We consider supermarket retail data as a proxy signal for influenza, through the identification of sentinel baskets.
We make use of the Support Vector Regression (SVR) model to produce the predictions of seasonal flu incidence.
arXiv Detail & Related papers (2020-12-08T16:30:43Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Steering a Historical Disease Forecasting Model Under a Pandemic: Case
of Flu and COVID-19 [75.99038202534628]
We propose CALI-Net, a neural transfer learning architecture which allows us to'steer' a historical disease forecasting model to new scenarios where flu and COVID co-exist.
Our experiments demonstrate that our approach is successful in adapting a historical forecasting model to the current pandemic.
arXiv Detail & Related papers (2020-09-23T22:35:43Z) - Privacy-Preserving Technology to Help Millions of People: Federated
Prediction Model for Stroke Prevention [25.276264953982253]
Our scientists and engineers propose a privacy-preserving scheme to predict the risk of stroke and deploy our federated prediction model on cloud servers.
Our model trains over all the healthcare data from hospitals in a certain city without actual data sharing among them.
Especially for small hospitals with few confirmed stroke cases, our federated model boosts model performance by 10%20% in several machine learning metrics.
arXiv Detail & Related papers (2020-06-15T08:51:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.