Efficient Classification of SARS-CoV-2 Spike Sequences Using Federated
Learning
- URL: http://arxiv.org/abs/2302.08688v2
- Date: Wed, 8 Nov 2023 22:26:28 GMT
- Title: Efficient Classification of SARS-CoV-2 Spike Sequences Using Federated
Learning
- Authors: Prakash Chourasia, Taslim Murad, Zahra Tayebi, Sarwan Ali, Imdad Ullah
Khan and Murray Patterson
- Abstract summary: We analyze SARS-CoV-2 spike sequences in a distributed way, without data sharing.
We achieve an overall accuracy of $93%$ on the coronavirus variant identification task.
We plan to use this proof-of-concept to implement a privacy-preserving pandemic response strategy.
- Score: 4.497217246897902
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a federated learning (FL) approach to train an AI model
for SARS-Cov-2 variant classification. We analyze the SARS-CoV-2 spike
sequences in a distributed way, without data sharing, to detect different
variants of this rapidly mutating coronavirus. Our method maintains the
confidentiality of local data (that could be stored in different locations) yet
allows us to reliably detect and identify different known and unknown variants
of the novel coronavirus SARS-CoV-2. Using the proposed approach, we achieve an
overall accuracy of $93\%$ on the coronavirus variant identification task. We
also provide details regarding how the proposed model follows the main laws of
federated learning, such as Laws of data ownership, data privacy, model
aggregation, and model heterogeneity. Since the proposed model is distributed,
it could scale on ``Big Data'' easily. We plan to use this proof-of-concept to
implement a privacy-preserving pandemic response strategy.
Related papers
- Improved Anomaly Detection through Conditional Latent Space VAE Ensembles [49.1574468325115]
Conditional Latent space Variational Autoencoder (CL-VAE) improved pre-processing for anomaly detection on data with known inlier classes and unknown outlier classes.
Model shows increased accuracy in anomaly detection, achieving an AUC of 97.4% on the MNIST dataset.
In addition, the CL-VAE shows increased benefits from ensembling, a more interpretable latent space, and an increased ability to learn patterns in complex data with limited model sizes.
arXiv Detail & Related papers (2024-10-16T07:48:53Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - Exploratory Analysis of Federated Learning Methods with Differential
Privacy on MIMIC-III [0.7349727826230862]
Federated learning methods offer the possibility of training machine learning models on privacy-sensitive data sets.
We present an evaluation of the impact of different federation and differential privacy techniques when training models on the open-source MIMIC-III dataset.
arXiv Detail & Related papers (2023-02-08T17:27:44Z) - Learning Classifiers of Prototypes and Reciprocal Points for Universal
Domain Adaptation [79.62038105814658]
Universal Domain aims to transfer the knowledge between datasets by handling two shifts: domain-shift and categoryshift.
Main challenge is correctly distinguishing the unknown target samples while adapting the distribution of known class knowledge from source to target.
Most existing methods approach this problem by first training the target adapted known and then relying on the single threshold to distinguish unknown target samples.
arXiv Detail & Related papers (2022-12-16T09:01:57Z) - Evaluating COVID-19 Sequence Data Using Nearest-Neighbors Based Network
Model [0.0]
SARS-CoV-2 coronavirus is the cause of the COVID-19 disease in humans.
It can adapt to different hosts and evolve into different lineages.
It is well-known that the major SARS-CoV-2 lineages are characterized by mutations that happen predominantly in the spike protein.
arXiv Detail & Related papers (2022-11-19T00:34:02Z) - Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence
Classification [109.81283748940696]
We introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio.
We show that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences.
arXiv Detail & Related papers (2022-07-18T19:16:56Z) - Robust Representation and Efficient Feature Selection Allows for
Effective Clustering of SARS-CoV-2 Variants [0.0]
The SARS-CoV-2 virus contains different variants, each of them having different mutations.
Much of the variation in the SARS-CoV-2 genome happens disproportionately in the spike region of the genome sequence.
We propose an approach to cluster spike protein sequences in order to study the behavior of different known variants.
arXiv Detail & Related papers (2021-10-18T21:18:52Z) - Effective and scalable clustering of SARS-CoV-2 sequences [0.41998444721319206]
SARS-CoV-2 continues to mutate as it spreads, according to an evolutionary process.
The number of currently available sequences of SARS-CoV-2 in public databases such as GISAID is already several million.
We propose an approach based on clustering sequences to identify the current major SARS-CoV-2 variants.
arXiv Detail & Related papers (2021-08-18T13:32:43Z) - A k-mer Based Approach for SARS-CoV-2 Variant Identification [55.78588835407174]
We show that preserving the order of the amino acids helps the underlying classifiers to achieve better performance.
We also show the importance of the different amino acids which play a key role in identifying variants and how they coincide with those reported by the USA's Centers for Disease Control and Prevention (CDC)
arXiv Detail & Related papers (2021-08-07T15:08:15Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.