A k-mer Based Approach for SARS-CoV-2 Variant Identification
- URL: http://arxiv.org/abs/2108.03465v1
- Date: Sat, 7 Aug 2021 15:08:15 GMT
- Title: A k-mer Based Approach for SARS-CoV-2 Variant Identification
- Authors: Sarwan Ali, Bikram Sahoo, Naimat Ullah, Alexander Zelikovskiy, Murray
Patterson, Imdadullah Khan
- Abstract summary: We show that preserving the order of the amino acids helps the underlying classifiers to achieve better performance.
We also show the importance of the different amino acids which play a key role in identifying variants and how they coincide with those reported by the USA's Centers for Disease Control and Prevention (CDC)
- Score: 55.78588835407174
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: With the rapid spread of the novel coronavirus (COVID-19) across the globe
and its continuous mutation, it is of pivotal importance to design a system to
identify different known (and unknown) variants of SARS-CoV-2. Identifying
particular variants helps to understand and model their spread patterns, design
effective mitigation strategies, and prevent future outbreaks. It also plays a
crucial role in studying the efficacy of known vaccines against each variant
and modeling the likelihood of breakthrough infections. It is well known that
the spike protein contains most of the information/variation pertaining to
coronavirus variants.
In this paper, we use spike sequences to classify different variants of the
coronavirus in humans. We show that preserving the order of the amino acids
helps the underlying classifiers to achieve better performance. We also show
that we can train our model to outperform the baseline algorithms using only a
small number of training samples ($1\%$ of the data). Finally, we show the
importance of the different amino acids which play a key role in identifying
variants and how they coincide with those reported by the USA's Centers for
Disease Control and Prevention (CDC).
Related papers
- Towards Within-Class Variation in Alzheimer's Disease Detection from Spontaneous Speech [60.08015780474457]
Alzheimer's Disease (AD) detection has emerged as a promising research area that employs machine learning classification models.
We identify within-class variation as a critical challenge in AD detection: individuals with AD exhibit a spectrum of cognitive impairments.
We propose two novel methods: Soft Target Distillation (SoTD) and Instance-level Re-balancing (InRe), targeting two problems respectively.
arXiv Detail & Related papers (2024-09-22T02:06:05Z) - Agent-Based Model: Simulating a Virus Expansion Based on the Acceptance
of Containment Measures [65.62256987706128]
Compartmental epidemiological models categorize individuals based on their disease status.
We propose an ABM architecture that combines an adapted SEIRD model with a decision-making model for citizens.
We illustrate the designed model by examining the progression of SARS-CoV-2 infections in A Coruna, Spain.
arXiv Detail & Related papers (2023-07-28T08:01:05Z) - Efficient Classification of SARS-CoV-2 Spike Sequences Using Federated
Learning [4.497217246897902]
We analyze SARS-CoV-2 spike sequences in a distributed way, without data sharing.
We achieve an overall accuracy of $93%$ on the coronavirus variant identification task.
We plan to use this proof-of-concept to implement a privacy-preserving pandemic response strategy.
arXiv Detail & Related papers (2023-02-17T04:41:39Z) - Dense Feature Memory Augmented Transformers for COVID-19 Vaccination
Search Classification [60.49594822215981]
This paper presents a classification model for detecting COVID-19 vaccination related search queries.
We propose a novel approach of considering dense features as memory tokens that the model can attend to.
We show that this new modeling approach enables a significant improvement to the Vaccine Search Insights (VSI) task.
arXiv Detail & Related papers (2022-12-16T13:57:41Z) - Unsupervised machine learning framework for discriminating major
variants of concern during COVID-19 [1.5346017713894948]
The COVID-19 pandemic evolved rapidly due to the high mutation rate of the virus.
Certain variants of the virus, such as Delta and Omicron, emerged with altered viral properties leading to severe transmission and death rates.
Unsupervised machine learning methods have the ability to compress, characterize, and visualize unlabelled data.
arXiv Detail & Related papers (2022-08-01T13:02:28Z) - Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence
Classification [109.81283748940696]
We introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio.
We show that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences.
arXiv Detail & Related papers (2022-07-18T19:16:56Z) - Using Deep Learning Sequence Models to Identify SARS-CoV-2 Divergence [1.9573380763700707]
SARS-CoV-2 is an upper respiratory system RNA virus that has caused over 3 million deaths and infecting over 150 million worldwide as of May 2021.
We propose a neural network model that leverages recurrent and convolutional units to take in amino acid sequences of spike proteins and classify corresponding clades.
arXiv Detail & Related papers (2021-11-12T07:52:11Z) - Robust Representation and Efficient Feature Selection Allows for
Effective Clustering of SARS-CoV-2 Variants [0.0]
The SARS-CoV-2 virus contains different variants, each of them having different mutations.
Much of the variation in the SARS-CoV-2 genome happens disproportionately in the spike region of the genome sequence.
We propose an approach to cluster spike protein sequences in order to study the behavior of different known variants.
arXiv Detail & Related papers (2021-10-18T21:18:52Z) - Effective and scalable clustering of SARS-CoV-2 sequences [0.41998444721319206]
SARS-CoV-2 continues to mutate as it spreads, according to an evolutionary process.
The number of currently available sequences of SARS-CoV-2 in public databases such as GISAID is already several million.
We propose an approach based on clustering sequences to identify the current major SARS-CoV-2 variants.
arXiv Detail & Related papers (2021-08-18T13:32:43Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.