Identifying Selections Operating on HIV-1 Reverse Transcriptase via
Uniform Manifold Approximation and Projection
- URL: http://arxiv.org/abs/2210.00345v1
- Date: Sat, 1 Oct 2022 19:08:16 GMT
- Title: Identifying Selections Operating on HIV-1 Reverse Transcriptase via
Uniform Manifold Approximation and Projection
- Authors: Shefali Qamar, Manel Camps, Jay Kim
- Abstract summary: We analyze 14,651 HIV1 reverse transcriptase (HIV RT) sequences from the Stanford HIV Drug Resistance Database labeled with treatment regimen.
Our goal is to identify distinct sectors of HIV RT's sequence space that are undergoing evolution.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We analyze 14,651 HIV1 reverse transcriptase (HIV RT) sequences from the
Stanford HIV Drug Resistance Database labeled with treatment regimen in order
to study the evolution this enzyme under drug selection in the clinic. Our goal
is to identify distinct sectors of HIV RT's sequence space that are undergoing
evolution as a way to identify individual selections and/or evolutionary
solutions. We utilize Uniform Manifold Approximation and Projection (UMAP), a
graph-based dimensionality reduction technique uniquely suited for the
detection of non-linear dependencies and visualize the results using an
unsupervised clustering algorithm based on density analysis. Our analysis
produced 21 distinct clusters of sequences. Supporting the biological
significance of these clusters, they tend to represent phylogenetically related
sequences with strong correspondence to distinct treatment regimens. Thus, this
method for visualization of areas of HIV RT undergoing evolution can help infer
information about selective pressures, although it is correlative. The mutation
signatures associated with each cluster may represent the higher-order
epistatic context facilitating these evolutionary pathways, information that is
generally not accessible by other types of mutational co-dependence analyses.
Related papers
- Detecting and Identifying Selection Structure in Sequential Data [53.24493902162797]
We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences.
We show that selection structure is identifiable without any parametric assumptions or interventional experiments.
We also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies.
arXiv Detail & Related papers (2024-06-29T20:56:34Z) - Combining propensity score methods with variational autoencoders for
generating synthetic data in presence of latent sub-groups [0.0]
Heterogeneity might be known, e.g., as indicated by sub-groups labels, or might be unknown and reflected only in properties of distributions, such as bimodality or skewness.
We investigate how such heterogeneity can be preserved and controlled when obtaining synthetic data from variational autoencoders (VAEs), i.e., a generative deep learning technique.
arXiv Detail & Related papers (2023-12-12T22:49:24Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Bayesian outcome-guided multi-view mixture models with applications in
molecular precision medicine [0.0]
Clustering is commonly performed as an initial analysis step for uncovering structure in 'omics datasets.
We propose a multi-view Bayesian mixture model that identifies groups of variables (views"), each of which defines a distinct clustering structure.
We consider applications in stratified medicine, for which our principal goal is to identify clusters of patients that define distinct, clinically actionable disease subtypes.
arXiv Detail & Related papers (2023-03-01T08:32:23Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - Unsupervised machine learning framework for discriminating major
variants of concern during COVID-19 [1.5346017713894948]
The COVID-19 pandemic evolved rapidly due to the high mutation rate of the virus.
Certain variants of the virus, such as Delta and Omicron, emerged with altered viral properties leading to severe transmission and death rates.
Unsupervised machine learning methods have the ability to compress, characterize, and visualize unlabelled data.
arXiv Detail & Related papers (2022-08-01T13:02:28Z) - Effective and scalable clustering of SARS-CoV-2 sequences [0.41998444721319206]
SARS-CoV-2 continues to mutate as it spreads, according to an evolutionary process.
The number of currently available sequences of SARS-CoV-2 in public databases such as GISAID is already several million.
We propose an approach based on clustering sequences to identify the current major SARS-CoV-2 variants.
arXiv Detail & Related papers (2021-08-18T13:32:43Z) - A k-mer Based Approach for SARS-CoV-2 Variant Identification [55.78588835407174]
We show that preserving the order of the amino acids helps the underlying classifiers to achieve better performance.
We also show the importance of the different amino acids which play a key role in identifying variants and how they coincide with those reported by the USA's Centers for Disease Control and Prevention (CDC)
arXiv Detail & Related papers (2021-08-07T15:08:15Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - A Novel Granular-Based Bi-Clustering Method of Deep Mining the
Co-Expressed Genes [76.84066556597342]
Bi-clustering methods are used to mine bi-clusters whose subsets of samples (genes) are co-regulated under their test conditions.
Unfortunately, traditional bi-clustering methods are not fully effective in discovering such bi-clusters.
We propose a novel bi-clustering method by involving here the theory of Granular Computing.
arXiv Detail & Related papers (2020-05-12T02:04:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.