Metagenome2Vec: Building Contextualized Representations for Scalable
Metagenome Analysis
- URL: http://arxiv.org/abs/2111.08001v1
- Date: Tue, 9 Nov 2021 23:21:10 GMT
- Title: Metagenome2Vec: Building Contextualized Representations for Scalable
Metagenome Analysis
- Authors: Sathyanarayanan N. Aakur, Vineela Indla, Vennela Indla, Sai Narayanan,
Arunkumar Bagavathi, Vishalini Laguduva Ramnath, Akhilesh Ramachandran
- Abstract summary: We propose Metagenome2Vec - a contextualized representation that captures the global structural properties inherent in metagenome data.
We show that the learned representations can help detect six (6) related pathogens from clinical samples with less than 100 labeled sequences.
- Score: 4.807955518532493
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advances in next-generation metagenome sequencing have the potential to
revolutionize the point-of-care diagnosis of novel pathogen infections, which
could help prevent potential widespread transmission of diseases. Given the
high volume of metagenome sequences, there is a need for scalable frameworks to
analyze and segment metagenome sequences from clinical samples, which can be
highly imbalanced. There is an increased need for learning robust
representations from metagenome reads since pathogens within a family can have
highly similar genome structures (some more than 90%) and hence enable the
segmentation and identification of novel pathogen sequences with limited
labeled data. In this work, we propose Metagenome2Vec - a contextualized
representation that captures the global structural properties inherent in
metagenome data and local contextualized properties through self-supervised
representation learning. We show that the learned representations can help
detect six (6) related pathogens from clinical samples with less than 100
labeled sequences. Extensive experiments on simulated and clinical metagenome
data show that the proposed representation encodes compositional properties
that can generalize beyond annotations to segment novel pathogens in an
unsupervised setting.
Related papers
- Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - T-Phenotype: Discovering Phenotypes of Predictive Temporal Patterns in
Disease Progression [82.85825388788567]
We develop a novel temporal clustering method, T-Phenotype, to discover phenotypes of predictive temporal patterns from labeled time-series data.
We show that T-Phenotype achieves the best phenotype discovery performance over all the evaluated baselines.
arXiv Detail & Related papers (2023-02-24T13:30:35Z) - Scalable Pathogen Detection from Next Generation DNA Sequencing with
Deep Learning [3.8175773487333857]
We propose MG2Vec, a deep learning-based solution that uses the transformer network as its backbone.
We show that the proposed approach can help detect pathogens from uncurated, real-world clinical samples.
We provide a comprehensive evaluation of a novel representation learning framework for metagenome-based disease diagnostics with deep learning.
arXiv Detail & Related papers (2022-11-30T00:13:59Z) - MG-NET: Leveraging Pseudo-Imaging for Multi-Modal Metagenome Analysis [5.04905391284093]
We propose MG-Net, a self-supervised representation learning framework.
We show that MG-Net can learn robust representations from unlabeled data.
Experiments show that the learned features outperform current baseline metagenome representations.
arXiv Detail & Related papers (2021-07-21T05:53:01Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z) - Weakly-Supervised Cross-Domain Adaptation for Endoscopic Lesions
Segmentation [79.58311369297635]
We propose a new weakly-supervised lesions transfer framework, which can explore transferable domain-invariant knowledge across different datasets.
A Wasserstein quantified transferability framework is developed to highlight widerange transferable contextual dependencies.
A novel self-supervised pseudo label generator is designed to equally provide confident pseudo pixel labels for both hard-to-transfer and easy-to-transfer target samples.
arXiv Detail & Related papers (2020-12-08T02:26:03Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Statistical Linear Models in Virus Genomic Alignment-free Classification: Application to Hepatitis C Viruses [2.900522306460408]
This study explores the power of linear classifiers in genotyping and subtyping partial and complete genomes.
It is applied to the Hepatitis C viruses (HCV)
Overall, several classifiers perform well given a set of precise combination of the experimental variables.
arXiv Detail & Related papers (2019-10-11T21:40:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.