Bio-JOIE: Joint Representation Learning of Biological Knowledge Bases
- URL: http://arxiv.org/abs/2103.04283v1
- Date: Sun, 7 Mar 2021 07:06:53 GMT
- Title: Bio-JOIE: Joint Representation Learning of Biological Knowledge Bases
- Authors: Junheng Hao, Chelsea Ju, Muhao Chen, Yizhou Sun, Carlo Zaniolo, Wei
Wang
- Abstract summary: We show that Bio-JOIE can accurately identify PPIs between the SARS-CoV-2 proteins and human proteins.
By leveraging only structured knowledge, Bio-JOIE significantly outperforms existing state-of-the-art methods in PPI type prediction on multiple species.
- Score: 38.9571812880758
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The widespread of Coronavirus has led to a worldwide pandemic with a high
mortality rate. Currently, the knowledge accumulated from different studies
about this virus is very limited. Leveraging a wide-range of biological
knowledge, such as gene ontology and protein-protein interaction (PPI) networks
from other closely related species presents a vital approach to infer the
molecular impact of a new species. In this paper, we propose the transferred
multi-relational embedding model Bio-JOIE to capture the knowledge of gene
ontology and PPI networks, which demonstrates superb capability in modeling the
SARS-CoV-2-human protein interactions. Bio-JOIE jointly trains two model
components. The knowledge model encodes the relational facts from the protein
and GO domains into separated embedding spaces, using a hierarchy-aware
encoding technique employed for the GO terms. On top of that, the transfer
model learns a non-linear transformation to transfer the knowledge of PPIs and
gene ontology annotations across their embedding spaces. By leveraging only
structured knowledge, Bio-JOIE significantly outperforms existing
state-of-the-art methods in PPI type prediction on multiple species.
Furthermore, we also demonstrate the potential of leveraging the learned
representations on clustering proteins with enzymatic function into enzyme
commission families. Finally, we show that Bio-JOIE can accurately identify
PPIs between the SARS-CoV-2 proteins and human proteins, providing valuable
insights for advancing research on this new disease.
Related papers
- An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab [67.24684071577211]
The challenge of replicating research results has posed a significant impediment to the field of molecular biology.
We first curate a comprehensive multimodal dataset, named ProBio, as an initial step towards this objective.
Next, we devise two challenging benchmarks, transparent solution tracking and multimodal action recognition, to emphasize the unique characteristics and difficulties associated with activity understanding in BioLab settings.
arXiv Detail & Related papers (2023-11-01T14:44:01Z) - Bridging Phylogeny and Taxonomy with Protein-protein Interaction
Networks [0.0]
The protein-protein interaction (PPI) network provides an overview of the complex biological reactions vital to an organism's metabolism and survival.
We aim to increase our understanding of the tree of life and taxonomy by gleaming information from the PPI networks.
arXiv Detail & Related papers (2023-10-26T05:32:33Z) - Tertiary Lymphoid Structures Generation through Graph-based Diffusion [54.37503714313661]
In this work, we leverage state-of-the-art graph-based diffusion models to generate biologically meaningful cell-graphs.
We show that the adopted graph diffusion model is able to accurately learn the distribution of cells in terms of their tertiary lymphoid structures (TLS) content.
arXiv Detail & Related papers (2023-10-10T14:37:17Z) - Two Novel Approaches to Detect Community: A Case Study of Omicron
Lineage Variants PPI Network [0.5156484100374058]
We aim to uncover the communities within the variant B.1.1.529 (Omicron virus) using two proposed novel algorithms and four widely recognized algorithms.
We also compare the networks by the global properties, statistic summary, subgraph count, graphlet and validate by the modulaity.
arXiv Detail & Related papers (2023-08-09T03:51:20Z) - Knowledge Graph Completion based on Tensor Decomposition for Disease
Gene Prediction [2.838553480267889]
We construct a biological knowledge graph centered on diseases and genes, and develop an end-to-end Knowledge graph completion model for Disease Gene Prediction.
KDGene introduces an interaction module between the embeddings of entities and relations to tensor decomposition, which can effectively enhance the information interaction in biological knowledge.
arXiv Detail & Related papers (2023-02-18T13:57:44Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - Heterogeneous Graph based Deep Learning for Biomedical Network Link
Prediction [7.628651624423363]
We propose a Graph Pair based Link Prediction model (GPLP) for predicting biomedical network links.
InP, 1-hop subgraphs extracted from known network interaction matrix is learnt to predict missing links.
Our method demonstrates the potential applications in other biomedical networks.
arXiv Detail & Related papers (2021-01-28T07:35:29Z) - A Cross-Level Information Transmission Network for Predicting Phenotype
from New Genotype: Application to Cancer Precision Medicine [37.442717660492384]
We propose a novel Cross-LEvel Information Transmission network (CLEIT) framework.
Inspired by domain adaptation, CLEIT first learns the latent representation of high-level domain then uses it as ground-truth embedding.
We demonstrate the effectiveness and performance boost of CLEIT in predicting anti-cancer drug sensitivity from somatic mutations.
arXiv Detail & Related papers (2020-10-09T22:01:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.