Graph Neural Networks for Microbial Genome Recovery
- URL: http://arxiv.org/abs/2204.12270v1
- Date: Tue, 26 Apr 2022 12:49:51 GMT
- Title: Graph Neural Networks for Microbial Genome Recovery
- Authors: Andre Lamurias, Alessandro Tibo, Katja Hose, Mads Albertsen and Thomas
Dyhre Nielsen
- Abstract summary: We propose to use Graph Neural Networks (GNNs) to leverage the assembly graph when learning contig representations for metagenomic binning.
Our method, VaeG-Bin, combines variational autoencoders for learning latent representations of the individual contigs, with GNNs for refining these representations by taking into account the neighborhood structure of the contigs in the assembly graph.
- Score: 64.91162205624848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Microbes have a profound impact on our health and environment, but our
understanding of the diversity and function of microbial communities is
severely limited. Through DNA sequencing of microbial communities
(metagenomics), DNA fragments (reads) of the individual microbes can be
obtained, which through assembly graphs can be combined into long contiguous
DNA sequences (contigs). Given the complexity of microbial communities, single
contig microbial genomes are rarely obtained. Instead, contigs are eventually
clustered into bins, with each bin ideally making up a full genome. This
process is referred to as metagenomic binning.
Current state-of-the-art techniques for metagenomic binning rely only on the
local features for the individual contigs. These techniques therefore fail to
exploit the similarities between contigs as encoded by the assembly graph, in
which the contigs are organized. In this paper, we propose to use Graph Neural
Networks (GNNs) to leverage the assembly graph when learning contig
representations for metagenomic binning. Our method, VaeG-Bin, combines
variational autoencoders for learning latent representations of the individual
contigs, with GNNs for refining these representations by taking into account
the neighborhood structure of the contigs in the assembly graph. We explore
several types of GNNs and demonstrate that VaeG-Bin recovers more high-quality
genomes than other state-of-the-art binners on both simulated and real-world
datasets.
Related papers
- Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity [3.972930262155919]
We propose a framework taking advantage of existing large models for gene vectorization to predict habitat specificity from entire microbial genome sequences.
We train and validate our approach on a large dataset of high quality microbiome genomes from different habitats.
arXiv Detail & Related papers (2024-05-09T09:34:51Z) - GraSSRep: Graph-Based Self-Supervised Learning for Repeat Detection in
Metagenomic Assembly [24.55141372357102]
Repetitive DNA (repeats) poses significant challenges for accurate and efficient genome assembly and sequence alignment.
GraSSRep is a self-supervised learning framework to classify DNA sequences into repetitive and non-repetitive categories.
GraSSRep combines sequencing features with pre-defined and learned graph features to achieve state-of-the-art performance in repeat detection.
arXiv Detail & Related papers (2024-02-14T18:26:58Z) - Efficient and Scalable Fine-Tune of Language Models for Genome
Understanding [49.606093223945734]
We present textscLingo: textscLanguage prefix ftextscIne-tuning for textscGentextscOmes.
Unlike DNA foundation models, textscLingo strategically leverages natural language foundation models' contextual cues.
textscLingo further accommodates numerous downstream fine-tune tasks by an adaptive rank sampling method.
arXiv Detail & Related papers (2024-02-12T21:40:45Z) - AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context
Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images.
AMIGO uses the celluar graph within the tissue to provide a single representation for a patient.
We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z) - Simple and Efficient Heterogeneous Graph Neural Network [55.56564522532328]
Heterogeneous graph neural networks (HGNNs) have powerful capability to embed rich structural and semantic information of a heterogeneous graph into node representations.
Existing HGNNs inherit many mechanisms from graph neural networks (GNNs) over homogeneous graphs, especially the attention mechanism and the multi-layer structure.
This paper conducts an in-depth and detailed study of these mechanisms and proposes Simple and Efficient Heterogeneous Graph Neural Network (SeHGNN)
arXiv Detail & Related papers (2022-07-06T10:01:46Z) - Learning to Untangle Genome Assembly with Graph Convolutional Networks [17.227634756670835]
We introduce a new learning framework to train a graph convolutional network to resolve assembly graphs by finding a correct path through them.
Experimental results show that a model, trained on simulated graphs generated solely from a single chromosome, is able to remarkably resolve all other chromosomes.
arXiv Detail & Related papers (2022-06-01T04:14:25Z) - RepBin: Constraint-based Graph Representation Learning for Metagenomic
Binning [12.561034842067889]
We present a new formulation using a graph where the nodes are subsequences and edges represent homophily information.
We develop new algorithms for (i) graph representation learning that preserves both homophily relations and heterophily constraints.
Our approach, called RepBin, outperforms a wide variety of competing methods.
arXiv Detail & Related papers (2021-12-22T07:01:01Z) - Deep metric learning improves lab of origin prediction of genetically
engineered plasmids [63.05016513788047]
Genetic engineering attribution (GEA) is the ability to make sequence-lab associations.
We propose a method, based on metric learning, that ranks the most likely labs-of-origin.
We are able to extract key signatures in plasmid sequences for particular labs, allowing for an interpretable examination of the model's outputs.
arXiv Detail & Related papers (2021-11-24T16:29:03Z) - A step towards neural genome assembly [0.0]
We train the MPNN model with max-aggregator to execute several algorithms for graph simplification.
We show that the algorithms were learned successfully and can be scaled to graphs of sizes up to 20 times larger than the ones used in training.
arXiv Detail & Related papers (2020-11-10T10:12:19Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.