Unsupervised machine learning framework for discriminating major
variants of concern during COVID-19
- URL: http://arxiv.org/abs/2208.01439v3
- Date: Thu, 25 May 2023 22:28:28 GMT
- Title: Unsupervised machine learning framework for discriminating major
variants of concern during COVID-19
- Authors: Rohitash Chandra, Chaarvi Bansal, Mingyue Kang, Tom Blau, Vinti
Agarwal, Pranjal Singh, Laurence O. W. Wilson, Seshadri Vasan
- Abstract summary: The COVID-19 pandemic evolved rapidly due to the high mutation rate of the virus.
Certain variants of the virus, such as Delta and Omicron, emerged with altered viral properties leading to severe transmission and death rates.
Unsupervised machine learning methods have the ability to compress, characterize, and visualize unlabelled data.
- Score: 1.5346017713894948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the high mutation rate of the virus, the COVID-19 pandemic evolved
rapidly. Certain variants of the virus, such as Delta and Omicron, emerged with
altered viral properties leading to severe transmission and death rates. These
variants burdened the medical systems worldwide with a major impact to travel,
productivity, and the world economy. Unsupervised machine learning methods have
the ability to compress, characterize, and visualize unlabelled data. This
paper presents a framework that utilizes unsupervised machine learning methods
to discriminate and visualize the associations between major COVID-19 variants
based on their genome sequences. These methods comprise a combination of
selected dimensionality reduction and clustering techniques. The framework
processes the RNA sequences by performing a k-mer analysis on the data and
further visualises and compares the results using selected dimensionality
reduction methods that include principal component analysis (PCA),
t-distributed stochastic neighbour embedding (t-SNE), and uniform manifold
approximation projection (UMAP). Our framework also employs agglomerative
hierarchical clustering to visualize the mutational differences among major
variants of concern and country-wise mutational differences for selected
variants (Delta and Omicron) using dendrograms. We also provide country-wise
mutational differences for selected variants via dendrograms. We find that the
proposed framework can effectively distinguish between the major variants and
has the potential to identify emerging variants in the future.
Related papers
- Genetic InfoMax: Exploring Mutual Information Maximization in
High-Dimensional Imaging Genetics Studies [50.11449968854487]
Genome-wide association studies (GWAS) are used to identify relationships between genetic variations and specific traits.
Representation learning for imaging genetics is largely under-explored due to the unique challenges posed by GWAS.
We introduce a trans-modal learning framework Genetic InfoMax (GIM) to address the specific challenges of GWAS.
arXiv Detail & Related papers (2023-09-26T03:59:21Z) - Multimodal Pathology Image Search Between H&E Slides and Multiplexed
Immunofluorescent Images [0.0]
We present an approach for multimodal pathology image search using dynamic time warping (DTW) on Variational Autoencoder (VAE) latent space.
Through training the VAE and applying DTW, we align and compare mIF and H&E slides.
Our method improves differential diagnosis and therapeutic decisions by integrating morphological H&E data with immunophenotyping from mIF.
arXiv Detail & Related papers (2023-06-11T21:30:20Z) - Ambiguous Medical Image Segmentation using Diffusion Models [60.378180265885945]
We introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights.
Our proposed model generates a distribution of segmentation masks by leveraging the inherent sampling process of diffusion.
Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks.
arXiv Detail & Related papers (2023-04-10T17:58:22Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - Identifying Selections Operating on HIV-1 Reverse Transcriptase via
Uniform Manifold Approximation and Projection [0.0]
We analyze 14,651 HIV1 reverse transcriptase (HIV RT) sequences from the Stanford HIV Drug Resistance Database labeled with treatment regimen.
Our goal is to identify distinct sectors of HIV RT's sequence space that are undergoing evolution.
arXiv Detail & Related papers (2022-10-01T19:08:16Z) - Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence
Classification [109.81283748940696]
We introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio.
We show that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences.
arXiv Detail & Related papers (2022-07-18T19:16:56Z) - Robust Representation and Efficient Feature Selection Allows for
Effective Clustering of SARS-CoV-2 Variants [0.0]
The SARS-CoV-2 virus contains different variants, each of them having different mutations.
Much of the variation in the SARS-CoV-2 genome happens disproportionately in the spike region of the genome sequence.
We propose an approach to cluster spike protein sequences in order to study the behavior of different known variants.
arXiv Detail & Related papers (2021-10-18T21:18:52Z) - A k-mer Based Approach for SARS-CoV-2 Variant Identification [55.78588835407174]
We show that preserving the order of the amino acids helps the underlying classifiers to achieve better performance.
We also show the importance of the different amino acids which play a key role in identifying variants and how they coincide with those reported by the USA's Centers for Disease Control and Prevention (CDC)
arXiv Detail & Related papers (2021-08-07T15:08:15Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Comparison of Anomaly Detectors: Context Matters [0.0]
The objective of this comparison is twofold: comparison of anomaly detection methods of various paradigms, and identification of sources of variability that can yield different results.
The best results on the image data were obtained either by a feature-matching GAN or a combination of variational autoencoder (VAE) and OC-SVM, depending on the experimental conditions.
arXiv Detail & Related papers (2020-12-11T11:50:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.