Advanced atom-level representations for protein flexibility prediction utilizing graph neural networks
- URL: http://arxiv.org/abs/2408.12519v1
- Date: Thu, 22 Aug 2024 16:15:13 GMT
- Title: Advanced atom-level representations for protein flexibility prediction utilizing graph neural networks
- Authors: Sina Sarparast, Aldo Zaimi, Maximilian Ebert, Michael-Rock Goldsmith,
- Abstract summary: We propose graph neural networks (GNNs) to learn protein representations at the atomic level and predict B-factors from protein 3D structures.
The Meta-GNN model achieves a correlation coefficient of 0.71 on a large and diverse test set of over 4k proteins.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Protein dynamics play a crucial role in many biological processes and drug interactions. However, measuring, and simulating protein dynamics is challenging and time-consuming. While machine learning holds promise in deciphering the determinants of protein dynamics from structural information, most existing methods for protein representation learning operate at the residue level, ignoring the finer details of atomic interactions. In this work, we propose for the first time to use graph neural networks (GNNs) to learn protein representations at the atomic level and predict B-factors from protein 3D structures. The B-factor reflects the atomic displacement of atoms in proteins, and can serve as a surrogate for protein flexibility. We compared different GNN architectures to assess their performance. The Meta-GNN model achieves a correlation coefficient of 0.71 on a large and diverse test set of over 4k proteins (17M atoms) from the Protein Data Bank (PDB), outperforming previous methods by a large margin. Our work demonstrates the potential of representations learned by GNNs for protein flexibility prediction and other related tasks.
Related papers
- SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - Long-context Protein Language Model [76.95505296417866]
Self-supervised training of language models (LMs) has seen great success for protein sequences in learning meaningful representations and for generative drug design.
Most protein LMs are based on the Transformer architecture trained on individual proteins with short context lengths.
We propose LC-PLM based on an alternative protein LM architecture, BiMamba-S, built off selective structured state-space models.
We also introduce its graph-contextual variant, LC-PLM-G, which contextualizes protein-protein interaction graphs for a second stage of training.
arXiv Detail & Related papers (2024-10-29T16:43:28Z) - GOProteinGNN: Leveraging Protein Knowledge Graphs for Protein Representation Learning [27.192150057715835]
GOProteinGNN is a novel architecture that enhances protein language models by integrating protein knowledge graph information.
Our approach allows for the integration of information at both the individual amino acid level and the entire protein level, enabling a comprehensive and effective learning process.
arXiv Detail & Related papers (2024-07-31T17:54:22Z) - NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks [60.48306899271866]
We propose novel semantic data augmentation methods to incorporate backbone chemical and side-chain biophysical information into protein classification tasks.
Specifically, we leverage molecular biophysical, secondary structure, chemical bonds, andionic features of proteins to facilitate classification tasks.
arXiv Detail & Related papers (2024-03-21T13:27:57Z) - Learning the shape of protein micro-environments with a holographic
convolutional neural network [0.0]
We introduce Holographic Convolutional Neural Network (H-CNN) for proteins.
H-CNN is a physically motivated machine learning approach to model amino acid preferences in protein structures.
It accurately predicts the impact of mutations on protein function, including stability and binding of protein complexes.
arXiv Detail & Related papers (2022-11-05T16:29:15Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Structure-aware Protein Self-supervised Learning [50.04673179816619]
We propose a novel structure-aware protein self-supervised learning method to capture structural information of proteins.
In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information.
We identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme.
arXiv Detail & Related papers (2022-04-06T02:18:41Z) - OntoProtein: Protein Pretraining With Gene Ontology Embedding [36.92674447484136]
We propose OntoProtein, the first general framework that makes use of structure in GO (Gene Ontology) into protein pre-training models.
We construct a novel large-scale knowledge graph that consists of GO and its related proteins, and gene annotation texts or protein sequences describe all nodes in the graph.
arXiv Detail & Related papers (2022-01-23T14:49:49Z) - PersGNN: Applying Topological Data Analysis and Geometric Deep Learning
to Structure-Based Protein Function Prediction [0.07340017786387766]
In this work, we isolate protein structure to make functional annotations for proteins in the Protein Data Bank.
We present PersGNN - an end-to-end trainable deep learning model that combines graph representation learning with topological data analysis.
arXiv Detail & Related papers (2020-10-30T02:24:35Z) - BERTology Meets Biology: Interpreting Attention in Protein Language
Models [124.8966298974842]
We demonstrate methods for analyzing protein Transformer models through the lens of attention.
We show that attention captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure.
We also present a three-dimensional visualization of the interaction between attention and protein structure.
arXiv Detail & Related papers (2020-06-26T21:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.