Scalable Pathogen Detection from Next Generation DNA Sequencing with
Deep Learning
- URL: http://arxiv.org/abs/2212.00015v1
- Date: Wed, 30 Nov 2022 00:13:59 GMT
- Title: Scalable Pathogen Detection from Next Generation DNA Sequencing with
Deep Learning
- Authors: Sai Narayanan and Sathyanarayanan N. Aakur and Priyadharsini
Ramamurthy and Arunkumar Bagavathi and Vishalini Ramnath and Akhilesh
Ramachandran
- Abstract summary: We propose MG2Vec, a deep learning-based solution that uses the transformer network as its backbone.
We show that the proposed approach can help detect pathogens from uncurated, real-world clinical samples.
We provide a comprehensive evaluation of a novel representation learning framework for metagenome-based disease diagnostics with deep learning.
- Score: 3.8175773487333857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Next-generation sequencing technologies have enhanced the scope of
Internet-of-Things (IoT) to include genomics for personalized medicine through
the increased availability of an abundance of genome data collected from
heterogeneous sources at a reduced cost. Given the sheer magnitude of the
collected data and the significant challenges offered by the presence of highly
similar genomic structure across species, there is a need for robust, scalable
analysis platforms to extract actionable knowledge such as the presence of
potentially zoonotic pathogens. The emergence of zoonotic diseases from novel
pathogens, such as the influenza virus in 1918 and SARS-CoV-2 in 2019 that can
jump species barriers and lead to pandemic underscores the need for scalable
metagenome analysis. In this work, we propose MG2Vec, a deep learning-based
solution that uses the transformer network as its backbone, to learn robust
features from raw metagenome sequences for downstream biomedical tasks such as
targeted and generalized pathogen detection. Extensive experiments on four
increasingly challenging, yet realistic diagnostic settings, show that the
proposed approach can help detect pathogens from uncurated, real-world clinical
samples with minimal human supervision in the form of labels. Further, we
demonstrate that the learned representations can generalize to completely
unrelated pathogens across diseases and species for large-scale metagenome
analysis. We provide a comprehensive evaluation of a novel representation
learning framework for metagenome-based disease diagnostics with deep learning
and provide a way forward for extracting and using robust vector
representations from low-cost next generation sequencing to develop
generalizable diagnostic tools.
Related papers
- Interpreting artificial neural networks to detect genome-wide association signals for complex traits [0.0]
Investigating the genetic architecture of complex diseases is challenging due to the highly polygenic and interactive landscape of genetic and environmental factors.
We trained artificial neural networks for predicting complex traits using both simulated and real genotype/phenotype datasets.
arXiv Detail & Related papers (2024-07-26T15:20:42Z) - PathoLM: Identifying pathogenicity from the DNA sequence through the Genome Foundation Model [9.285895422810704]
PathoLM is a cutting-edge pathogen language model optimized for the identification of pathogenicity in bacterial and viral sequences.
We developed a comprehensive data set comprising approximately 30 species of viruses and bacteria, including ESKAPEE pathogens.
In comparative assessments, PathoLM dramatically outperforms existing models like DciPatho, demonstrating robust zero-shot and few-shot capabilities.
arXiv Detail & Related papers (2024-06-19T00:53:48Z) - BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions.
BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model.
It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence
Classification [109.81283748940696]
We introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio.
We show that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences.
arXiv Detail & Related papers (2022-07-18T19:16:56Z) - COVID-Net Biochem: An Explainability-driven Framework to Building
Machine Learning Models for Predicting Survival and Kidney Injury of COVID-19
Patients from Clinical and Biochemistry Data [66.43957431843324]
We introduce COVID-Net Biochem, a versatile and explainable framework for constructing machine learning models.
We apply this framework to predict COVID-19 patient survival and the likelihood of developing Acute Kidney Injury during hospitalization.
arXiv Detail & Related papers (2022-04-24T07:38:37Z) - Metagenome2Vec: Building Contextualized Representations for Scalable
Metagenome Analysis [4.807955518532493]
We propose Metagenome2Vec - a contextualized representation that captures the global structural properties inherent in metagenome data.
We show that the learned representations can help detect six (6) related pathogens from clinical samples with less than 100 labeled sequences.
arXiv Detail & Related papers (2021-11-09T23:21:10Z) - MG-NET: Leveraging Pseudo-Imaging for Multi-Modal Metagenome Analysis [5.04905391284093]
We propose MG-Net, a self-supervised representation learning framework.
We show that MG-Net can learn robust representations from unlabeled data.
Experiments show that the learned features outperform current baseline metagenome representations.
arXiv Detail & Related papers (2021-07-21T05:53:01Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Genome Sequence Classification for Animal Diagnostics with Graph
Representations and Deep Neural Networks [4.339839287869652]
Bovine Respiratory Disease Complex (BRDC) is a complex respiratory disease in cattle with multiple etiologies, including bacterial and viral.
Current animal disease diagnostics is based on traditional tests such as bacterial culture, serolog, and Polymerase Chain Reaction (PCR) tests.
We show that networks-based machine learning approaches can detect pathogen signature with up to 89.7% accuracy.
arXiv Detail & Related papers (2020-07-24T22:30:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.