Using Deep Learning Sequence Models to Identify SARS-CoV-2 Divergence
- URL: http://arxiv.org/abs/2111.06593v1
- Date: Fri, 12 Nov 2021 07:52:11 GMT
- Title: Using Deep Learning Sequence Models to Identify SARS-CoV-2 Divergence
- Authors: Yanyi Ding, Zhiyi Kuang, Yuxin Pei, Jeff Tan, Ziyu Zhang, Joseph Konan
- Abstract summary: SARS-CoV-2 is an upper respiratory system RNA virus that has caused over 3 million deaths and infecting over 150 million worldwide as of May 2021.
We propose a neural network model that leverages recurrent and convolutional units to take in amino acid sequences of spike proteins and classify corresponding clades.
- Score: 1.9573380763700707
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: SARS-CoV-2 is an upper respiratory system RNA virus that has caused over 3
million deaths and infecting over 150 million worldwide as of May 2021. With
thousands of strains sequenced to date, SARS-CoV-2 mutations pose significant
challenges to scientists on keeping pace with vaccine development and public
health measures. Therefore, an efficient method of identifying the divergence
of lab samples from patients would greatly aid the documentation of SARS-CoV-2
genomics. In this study, we propose a neural network model that leverages
recurrent and convolutional units to directly take in amino acid sequences of
spike proteins and classify corresponding clades. We also compared our model's
performance with Bidirectional Encoder Representations from Transformers (BERT)
pre-trained on protein database. Our approach has the potential of providing a
more computationally efficient alternative to current homology based
intra-species differentiation.
Related papers
- Virus2Vec: Viral Sequence Classification Using Machine Learning [48.40285316053593]
We propose Virus2Vec, a feature-vector representation for viral sequences that enable machine learning models to identify viral hosts.
We empirically evaluate Virus2Vec on real-world spike sequences of Coronaviridae and rabies virus sequence data to predict the host.
Our results demonstrate that Virus2Vec outperforms the predictive accuracies of baseline and state-of-the-art methods.
arXiv Detail & Related papers (2023-04-24T08:17:16Z) - Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence
Classification [109.81283748940696]
We introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio.
We show that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences.
arXiv Detail & Related papers (2022-07-18T19:16:56Z) - PhyloTransformer: A Discriminative Model for Mutation Prediction Based
on a Multi-head Self-attention Mechanism [10.468453827172477]
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused an ongoing pandemic infecting 219 million people as of 10/19/21, with a 3.6% mortality rate.
Here we developed PhyloTransformer, a Transformer-based discriminative model that engages a multi-head self-attention mechanism to model genetic mutations that may lead to viral reproductive advantage.
arXiv Detail & Related papers (2021-11-03T01:30:57Z) - Robust Representation and Efficient Feature Selection Allows for
Effective Clustering of SARS-CoV-2 Variants [0.0]
The SARS-CoV-2 virus contains different variants, each of them having different mutations.
Much of the variation in the SARS-CoV-2 genome happens disproportionately in the spike region of the genome sequence.
We propose an approach to cluster spike protein sequences in order to study the behavior of different known variants.
arXiv Detail & Related papers (2021-10-18T21:18:52Z) - Effective and scalable clustering of SARS-CoV-2 sequences [0.41998444721319206]
SARS-CoV-2 continues to mutate as it spreads, according to an evolutionary process.
The number of currently available sequences of SARS-CoV-2 in public databases such as GISAID is already several million.
We propose an approach based on clustering sequences to identify the current major SARS-CoV-2 variants.
arXiv Detail & Related papers (2021-08-18T13:32:43Z) - A k-mer Based Approach for SARS-CoV-2 Variant Identification [55.78588835407174]
We show that preserving the order of the amino acids helps the underlying classifiers to achieve better performance.
We also show the importance of the different amino acids which play a key role in identifying variants and how they coincide with those reported by the USA's Centers for Disease Control and Prevention (CDC)
arXiv Detail & Related papers (2021-08-07T15:08:15Z) - Designing a Prospective COVID-19 Therapeutic with Reinforcement Learning [50.57291257437373]
SARS-CoV-2 pandemic has created a global race for a cure.
One approach focuses on designing a novel variant of the human angiotensin-converting enzyme 2 (ACE2)
We formulate a novel protein design framework as a reinforcement learning problem.
arXiv Detail & Related papers (2020-12-03T07:35:38Z) - CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors
and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic.
The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands.
We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z) - SARS-CoV-2 virus RNA sequence classification and geographical analysis
with convolutional neural networks approach [0.0]
Covid-19 infection, which spread to the whole world in December 2019 and is still active, caused more than 250 thousand deaths in the world today.
In this study, RNA sequences belonging to the SARS-CoV-2 virus are transformed into gene motifs with two basic image processing algorithms.
CNN models achieved an average of 98% Area Under Curve(AUC) value was achieved in RNA sequences classified as Asia, Europe, America, and Oceania.
arXiv Detail & Related papers (2020-07-09T20:43:22Z) - COVID-Net S: Towards computer-aided severity assessment via training and
validation of deep neural networks for geographic extent and opacity extent
scoring of chest X-rays for SARS-CoV-2 lung disease severity [58.23203766439791]
Chest x-rays (CXRs) are often used to assess SARS-CoV-2 severity.
In this study, we assess the feasibility of computer-aided scoring of CXRs of SARS-CoV-2 lung disease severity using a deep learning system.
arXiv Detail & Related papers (2020-05-26T16:33:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.