CNN-LSTM Hybrid Model for AI-Driven Prediction of COVID-19 Severity from Spike Sequences and Clinical Data
- URL: http://arxiv.org/abs/2505.23879v1
- Date: Thu, 29 May 2025 16:20:54 GMT
- Title: CNN-LSTM Hybrid Model for AI-Driven Prediction of COVID-19 Severity from Spike Sequences and Clinical Data
- Authors: Caio Cheohen, VinnÃcius M. S. Gomes, Manuela L. da Silva,
- Abstract summary: We developed a CNN-LSTM hybrid model to predict COVID-19 severity using spike protein sequences and clinical data.<n>The model achieved an F1 score of 82.92%, ROC-AUC of 0.9084, precision of 83.56%, and recall of 82.85%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The COVID-19 pandemic, caused by SARS-CoV-2, highlighted the critical need for accurate prediction of disease severity to optimize healthcare resource allocation and patient management. The spike protein, which facilitates viral entry into host cells, exhibits high mutation rates, particularly in the receptor-binding domain, influencing viral pathogenicity. Artificial intelligence approaches, such as deep learning, offer promising solutions for leveraging genomic and clinical data to predict disease outcomes. Objective: This study aimed to develop a hybrid CNN-LSTM deep learning model to predict COVID-19 severity using spike protein sequences and associated clinical metadata from South American patients. Methods: We retrieved 9,570 spike protein sequences from the GISAID database, of which 3,467 met inclusion criteria after standardization. The dataset included 2,313 severe and 1,154 mild cases. A feature engineering pipeline extracted features from sequences, while demographic and clinical variables were one-hot encoded. A hybrid CNN-LSTM architecture was trained, combining CNN layers for local pattern extraction and an LSTM layer for long-term dependency modeling. Results: The model achieved an F1 score of 82.92%, ROC-AUC of 0.9084, precision of 83.56%, and recall of 82.85%, demonstrating robust classification performance. Training stabilized at 85% accuracy with minimal overfitting. The most prevalent lineages (P.1, AY.99.2) and clades (GR, GK) aligned with regional epidemiological trends, suggesting potential associations between viral genetics and clinical outcomes. Conclusion: The CNN-LSTM hybrid model effectively predicted COVID-19 severity using spike protein sequences and clinical data, highlighting the utility of AI in genomic surveillance and precision public health. Despite limitations, this approach provides a framework for early severity prediction in future outbreaks.
Related papers
- Epidemic Forecasting with a Hybrid Deep Learning Method Using CNN-LSTM With WOA-GWO Parameter Optimization: Global COVID-19 Case Study [0.0]
This study introduces a novel deep learning framework that advances time series forecasting for infectious diseases.<n>Our framework is applied to COVID 19 case data from 24 countries across six continents.
arXiv Detail & Related papers (2025-03-17T04:41:26Z) - Prediction of Lung Metastasis from Hepatocellular Carcinoma using the SEER Database [0.9055332067000195]
Hepatocellular carcinoma (HCC) is a leading cause of cancer-related mortality.<n> predictive models for lung metastasis inHCC remain limited in scope and clinical applicability.<n>We develop and validate an end-to-end machine learning pipeline using data from the Surveillance, Epidemiology, and End Results (SEER) database.
arXiv Detail & Related papers (2025-01-20T20:06:31Z) - A multimodal ensemble approach for clear cell renal cell carcinoma treatment outcome prediction [6.199310532720352]
We developed a multi-modal ensemble model (MMEM) that integrates clinical data, multi-omics data, and histopathology whole slide image (WSI) data.<n>MMEM predicted overall survival (OS) and disease-free survival (DFS) for ccRCC patients.
arXiv Detail & Related papers (2024-12-10T02:51:14Z) - CRTRE: Causal Rule Generation with Target Trial Emulation Framework [47.2836994469923]
We introduce a novel method called causal rule generation with target trial emulation framework (CRTRE)
CRTRE applies randomize trial design principles to estimate the causal effect of association rules.
We then incorporate such association rules for the downstream applications such as prediction of disease onsets.
arXiv Detail & Related papers (2024-11-10T02:40:06Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - CEL: A Continual Learning Model for Disease Outbreak Prediction by
Leveraging Domain Adaptation via Elastic Weight Consolidation [4.693707128262634]
This study introduces a novel CEL model for continual learning by leveraging domain adaptation via Elastic Weight Consolidation (EWC)
CEL's robustness and reliability are underscored by its minimal 65% forgetting rate and 18% higher memory stability compared to existing benchmark studies.
arXiv Detail & Related papers (2024-01-17T03:26:04Z) - Improving COVID-19 CT Classification of CNNs by Learning
Parameter-Efficient Representation [31.51725965329019]
Deep learning methods have been proposed to assist clinicians in automatic COVID-19 diagnosis based on computed tomography imaging.
DenseNet121 achieves an average test accuracy of 99.44% in three trials for three-category classification, including normal, non-COVID-19 pneumonia, and COVID-19 pneumonia.
arXiv Detail & Related papers (2022-08-09T12:24:53Z) - The Report on China-Spain Joint Clinical Testing for Rapid COVID-19 Risk
Screening by Eye-region Manifestations [59.48245489413308]
We developed and tested a COVID-19 rapid prescreening model using the eye-region images captured in China and Spain with cellphone cameras.
The performance was measured using area under receiver-operating-characteristic curve (AUC), sensitivity, specificity, accuracy, and F1.
arXiv Detail & Related papers (2021-09-18T02:28:01Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors
and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic.
The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands.
We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.