Deep Contextual Learners for Protein Networks
- URL: http://arxiv.org/abs/2106.02246v1
- Date: Fri, 4 Jun 2021 04:26:27 GMT
- Title: Deep Contextual Learners for Protein Networks
- Authors: Michelle M. Li, Marinka Zitnik
- Abstract summary: We introduce AWARE, a graph neural message passing approach to inject cellular and tissue context into protein embeddings.
AWARE learns protein, cell type, and tissue embeddings that uphold cell type and tissue hierarchies.
We demonstrate AWARE on the novel task of predicting whether a gene is associated with a disease and where it most likely manifests in the human body.
- Score: 16.599890339599586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spatial context is central to understanding health and disease. Yet reference
protein interaction networks lack such contextualization, thereby limiting the
study of where protein interactions likely occur in the human body.
Contextualized protein interactions could better characterize genes with
disease-specific interactions and elucidate diseases' manifestation in specific
cell types. Here, we introduce AWARE, a graph neural message passing approach
to inject cellular and tissue context into protein embeddings. AWARE optimizes
for a multi-scale embedding space, whose structure reflects the topology of
cell type specific networks. We construct a multi-scale network of the Human
Cell Atlas and apply AWARE to learn protein, cell type, and tissue embeddings
that uphold cell type and tissue hierarchies. We demonstrate AWARE on the novel
task of predicting whether a gene is associated with a disease and where it
most likely manifests in the human body. AWARE embeddings outperform global
embeddings by at least 12.5%, highlighting the importance of contextual
learners for protein networks.
Related papers
- Long-context Protein Language Model [76.95505296417866]
Self-supervised training of language models (LMs) has seen great success for protein sequences in learning meaningful representations and for generative drug design.
Most protein LMs are based on the Transformer architecture trained on individual proteins with short context lengths.
We propose LC-PLM based on an alternative protein LM architecture, BiMamba-S, built off selective structured state-space models.
We also introduce its graph-contextual variant, LC-PLM-G, which contextualizes protein-protein interaction graphs for a second stage of training.
arXiv Detail & Related papers (2024-10-29T16:43:28Z) - ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction [54.132290875513405]
The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases.
Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions.
We propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time.
arXiv Detail & Related papers (2024-03-30T05:32:42Z) - NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks [60.48306899271866]
We propose novel semantic data augmentation methods to incorporate backbone chemical and side-chain biophysical information into protein classification tasks.
Specifically, we leverage molecular biophysical, secondary structure, chemical bonds, andionic features of proteins to facilitate classification tasks.
arXiv Detail & Related papers (2024-03-21T13:27:57Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Zyxin is all you need: machine learning adherent cell mechanics [0.0]
We develop a data-driven biophysical modeling approach to learn the mechanical behavior of adherent cells.
We first train neural networks to predict forces generated by adherent cells from images of cytoskeletal proteins.
We next develop two approaches - one explicitly constrained by physics, the other more continuum - that help construct data-driven models of cellular forces.
arXiv Detail & Related papers (2023-03-01T02:08:40Z) - Integration of Pre-trained Protein Language Models into Geometric Deep
Learning Networks [68.90692290665648]
We integrate knowledge learned by protein language models into several state-of-the-art geometric networks.
Our findings show an overall improvement of 20% over baselines.
Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin.
arXiv Detail & Related papers (2022-12-07T04:04:04Z) - Subcellular Protein Localisation in the Human Protein Atlas using
Ensembles of Diverse Deep Architectures [11.41081495236219]
Automated visual localisation of subcellular proteins can accelerate our understanding of cell function in health and disease.
We show how this gap can be narrowed by addressing three key aspects: (i) automated improvement of cell annotation quality, (ii) new Convolutional Neural Network (CNN) architectures supporting unbalanced and noisy data, and (iii) informed selection and fusion of multiple & diverse machine learning models.
arXiv Detail & Related papers (2022-05-19T20:28:56Z) - Global Mapping of Gene/Protein Interactions in PubMed Abstracts: A
Framework and an Experiment with P53 Interactions [7.361249273831739]
The large body of biomedical literature is an important source of gene/protein interaction information.
Recent advances in text mining tools have made it possible to automatically extract such documented interactions from free-text literature.
We propose a comprehensive framework for constructing and analyzing large-scale gene functional networks based on the gene/protein interactions extracted from biomedical literature repositories.
arXiv Detail & Related papers (2022-04-22T03:04:19Z) - A multitask transfer learning framework for the prediction of
virus-human protein-protein interactions [0.30586855806896046]
We develop a transfer learning approach that exploits the information of around 24 million protein sequences and the interaction patterns from the human interactome.
We employ an additional objective which aims to maximize the probability of observing human protein-protein interactions.
Experimental results show that our proposed model works effectively for both virus-human and bacteria-human protein-protein interaction prediction tasks.
arXiv Detail & Related papers (2021-11-26T07:53:51Z) - Bio-JOIE: Joint Representation Learning of Biological Knowledge Bases [38.9571812880758]
We show that Bio-JOIE can accurately identify PPIs between the SARS-CoV-2 proteins and human proteins.
By leveraging only structured knowledge, Bio-JOIE significantly outperforms existing state-of-the-art methods in PPI type prediction on multiple species.
arXiv Detail & Related papers (2021-03-07T07:06:53Z) - BERTology Meets Biology: Interpreting Attention in Protein Language
Models [124.8966298974842]
We demonstrate methods for analyzing protein Transformer models through the lens of attention.
We show that attention captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure.
We also present a three-dimensional visualization of the interaction between attention and protein structure.
arXiv Detail & Related papers (2020-06-26T21:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.