RepBin: Constraint-based Graph Representation Learning for Metagenomic
Binning
- URL: http://arxiv.org/abs/2112.11696v1
- Date: Wed, 22 Dec 2021 07:01:01 GMT
- Title: RepBin: Constraint-based Graph Representation Learning for Metagenomic
Binning
- Authors: Hansheng Xue, Vijini Mallawaarachchi, Yujia Zhang, Vaibhav Rajan, Yu
Lin
- Abstract summary: We present a new formulation using a graph where the nodes are subsequences and edges represent homophily information.
We develop new algorithms for (i) graph representation learning that preserves both homophily relations and heterophily constraints.
Our approach, called RepBin, outperforms a wide variety of competing methods.
- Score: 12.561034842067889
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixed communities of organisms are found in many environments (from the human
gut to marine ecosystems) and can have profound impact on human health and the
environment. Metagenomics studies the genomic material of such communities
through high-throughput sequencing that yields DNA subsequences for subsequent
analysis. A fundamental problem in the standard workflow, called binning, is to
discover clusters, of genomic subsequences, associated with the unknown
constituent organisms. Inherent noise in the subsequences, various biological
constraints that need to be imposed on them and the skewed cluster size
distribution exacerbate the difficulty of this unsupervised learning problem.
In this paper, we present a new formulation using a graph where the nodes are
subsequences and edges represent homophily information. In addition, we model
biological constraints providing heterophilous signal about nodes that cannot
be clustered together. We solve the binning problem by developing new
algorithms for (i) graph representation learning that preserves both homophily
relations and heterophily constraints (ii) constraint-based graph clustering
method that addresses the problems of skewed cluster size distribution.
Extensive experiments, on real and synthetic datasets, demonstrate that our
approach, called RepBin, outperforms a wide variety of competing methods. Our
constraint-based graph representation learning and clustering methods, that may
be useful in other domains as well, advance the state-of-the-art in both
metagenomics binning and graph representation learning.
Related papers
- The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges [101.83124435649358]
Homophily principle, ie nodes with the same labels or similar attributes are more likely to be connected.
Recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory.
arXiv Detail & Related papers (2024-07-12T18:04:32Z) - HeNCler: Node Clustering in Heterophilous Graphs through Learned Asymmetric Similarity [55.27586970082595]
HeNCler is a novel approach for Heterophilous Node Clustering.
We show that HeNCler significantly enhances performance in node clustering tasks within heterophilous graph contexts.
arXiv Detail & Related papers (2024-05-27T11:04:05Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - HiGPT: Heterogeneous Graph Language Model [27.390123898556805]
Heterogeneous graph learning aims to capture complex relationships and diverse semantics among entities in a heterogeneous graph.
Existing frameworks for heterogeneous graph learning have limitations in generalizing across diverse heterogeneous graph datasets.
We propose HiGPT, a general large graph model with Heterogeneous graph instruction-tuning paradigm.
arXiv Detail & Related papers (2024-02-25T08:07:22Z) - Product Manifold Representations for Learning on Biological Pathways [13.0916239254532]
We investigate the effects of embedding pathway graphs in non-Euclidean mixed-curvature spaces.
We train a supervised model using the learned node embeddings to predict missing protein-protein interactions in pathway graphs.
We find large reductions in distortion and boosts on in-distribution edge prediction performance as a result of using mixed-curvature embeddings.
arXiv Detail & Related papers (2024-01-27T18:46:19Z) - A GAN Approach for Node Embedding in Heterogeneous Graphs Using Subgraph Sampling [33.50085646298074]
We propose a novel framework that combines Graph Neural Network (GNN) and Generative Adrial Network (GAN) to enhance classification for underrepresented node classes.
The framework incorporates an advanced edge generation and selection module, enabling the simultaneous creation of synthetic nodes and edges.
arXiv Detail & Related papers (2023-12-11T16:52:20Z) - Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More [30.919536115917726]
We present a probabilistic model based on non-negative matrix factorization which unifies clustering and simplification.
By relaxing the hard clustering to a soft clustering, our algorithm relaxes potentially hard clustering problems to a tractable ones.
arXiv Detail & Related papers (2023-08-12T02:47:57Z) - Geometry Contrastive Learning on Heterogeneous Graphs [50.58523799455101]
This paper proposes a novel self-supervised learning method, termed as Geometry Contrastive Learning (GCL)
GCL views a heterogeneous graph from Euclidean and hyperbolic perspective simultaneously, aiming to make a strong merger of the ability of modeling rich semantics and complex structures.
Extensive experiments on four benchmarks data sets show that the proposed approach outperforms the strong baselines.
arXiv Detail & Related papers (2022-06-25T03:54:53Z) - Heterogeneous Graph Neural Networks using Self-supervised Reciprocally
Contrastive Learning [102.9138736545956]
Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs.
We develop for the first time a novel and robust heterogeneous graph contrastive learning approach, namely HGCL, which introduces two views on respective guidance of node attributes and graph topologies.
In this new approach, we adopt distinct but most suitable attribute and topology fusion mechanisms in the two views, which are conducive to mining relevant information in attributes and topologies separately.
arXiv Detail & Related papers (2022-04-30T12:57:02Z) - Multilayer Clustered Graph Learning [66.94201299553336]
We use contrastive loss as a data fidelity term, in order to properly aggregate the observed layers into a representative graph.
Experiments show that our method leads to a clustered clusters w.r.t.
We learn a clustering algorithm for solving clustering problems.
arXiv Detail & Related papers (2020-10-29T09:58:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.