Multi-Scale Representation Learning on Proteins
- URL: http://arxiv.org/abs/2204.02337v1
- Date: Mon, 4 Apr 2022 08:29:17 GMT
- Title: Multi-Scale Representation Learning on Proteins
- Authors: Vignesh Ram Somnath, Charlotte Bunne, Andreas Krause
- Abstract summary: This paper introduces a multi-scale graph construction of a protein -- HoloProt.
The surface captures coarser details of the protein, while sequence as primary component and structure captures finer details.
Our graph encoder then learns a multi-scale representation by allowing each level to integrate the encoding from level(s) below with the graph at that level.
- Score: 78.31410227443102
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Proteins are fundamental biological entities mediating key roles in cellular
function and disease. This paper introduces a multi-scale graph construction of
a protein -- HoloProt -- connecting surface to structure and sequence. The
surface captures coarser details of the protein, while sequence as primary
component and structure -- comprising secondary and tertiary components --
capture finer details. Our graph encoder then learns a multi-scale
representation by allowing each level to integrate the encoding from level(s)
below with the graph at that level. We test the learned representation on
different tasks, (i.) ligand binding affinity (regression), and (ii.) protein
function prediction (classification). On the regression task, contrary to
previous methods, our model performs consistently and reliably across different
dataset splits, outperforming all baselines on most splits. On the
classification task, it achieves a performance close to the top-performing
model while using 10x fewer parameters. To improve the memory efficiency of our
construction, we segment the multiplex protein surface manifold into molecular
superpixels and substitute the surface with these superpixels at little to no
performance loss.
Related papers
- A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks [60.48306899271866]
We propose novel semantic data augmentation methods to incorporate backbone chemical and side-chain biophysical information into protein classification tasks.
Specifically, we leverage molecular biophysical, secondary structure, chemical bonds, andionic features of proteins to facilitate classification tasks.
arXiv Detail & Related papers (2024-03-21T13:27:57Z) - Deep Manifold Transformation for Protein Representation Learning [42.43017670985785]
We propose a new underlinedeep underlinemanifold underlinetrans approach for universal underlineprotein underlinerepresentation underlinelformation (DMTPRL)
It employs manifold learning strategies to improve the quality and adaptability of the learned embeddings.
Our proposed DMTPRL method outperforms state-of-the-art baselines on diverse downstream tasks across popular datasets.
arXiv Detail & Related papers (2024-01-12T18:38:14Z) - Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers [18.498779242323582]
We propose a novel approach, Prot2Text, which predicts a protein's function in a free text style.
By combining Graph Neural Networks(GNNs) and Large Language Models(LLMs), in an encoder-decoder framework, our model effectively integrates diverse data types.
arXiv Detail & Related papers (2023-07-25T09:35:43Z) - Generative Pretrained Autoregressive Transformer Graph Neural Network
applied to the Analysis and Discovery of Novel Proteins [0.0]
We report a flexible language-model based deep learning strategy, applied here to solve complex forward and inverse problems in protein modeling.
The model is applied to predict secondary structure content (per-residue level and overall content), protein solubility, and sequencing tasks.
We find that adding additional tasks yields emergent synergies that the model exploits in improving overall performance.
arXiv Detail & Related papers (2023-05-07T12:30:24Z) - EquiPocket: an E(3)-Equivariant Geometric Graph Neural Network for Ligand Binding Site Prediction [49.674494450107005]
Predicting the binding sites of target proteins plays a fundamental role in drug discovery.
Most existing deep-learning methods consider a protein as a 3D image by spatially clustering its atoms into voxels.
This work proposes EquiPocket, an E(3)-equivariant Graph Neural Network (GNN) for binding site prediction.
arXiv Detail & Related papers (2023-02-23T17:18:26Z) - Learning multi-scale functional representations of proteins from
single-cell microscopy data [77.34726150561087]
We show that simple convolutional networks trained on localization classification can learn protein representations that encapsulate diverse functional information.
We also propose a robust evaluation strategy to assess quality of protein representations across different scales of biological function.
arXiv Detail & Related papers (2022-05-24T00:00:07Z) - Protein Representation Learning by Geometric Structure Pretraining [27.723095456631906]
Existing approaches usually pretrain protein language models on a large number of unlabeled amino acid sequences.
We first present a simple yet effective encoder to learn protein geometry features.
Experimental results on both function prediction and fold classification tasks show that our proposed pretraining methods outperform or are on par with the state-of-the-art sequence-based methods using much less data.
arXiv Detail & Related papers (2022-03-11T17:52:13Z) - PersGNN: Applying Topological Data Analysis and Geometric Deep Learning
to Structure-Based Protein Function Prediction [0.07340017786387766]
In this work, we isolate protein structure to make functional annotations for proteins in the Protein Data Bank.
We present PersGNN - an end-to-end trainable deep learning model that combines graph representation learning with topological data analysis.
arXiv Detail & Related papers (2020-10-30T02:24:35Z) - BERTology Meets Biology: Interpreting Attention in Protein Language
Models [124.8966298974842]
We demonstrate methods for analyzing protein Transformer models through the lens of attention.
We show that attention captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure.
We also present a three-dimensional visualization of the interaction between attention and protein structure.
arXiv Detail & Related papers (2020-06-26T21:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.