Neural Embeddings for Protein Graphs
- URL: http://arxiv.org/abs/2306.04667v1
- Date: Wed, 7 Jun 2023 14:50:34 GMT
- Title: Neural Embeddings for Protein Graphs
- Authors: Francesco Ceccarelli, Lorenzo Giusti, Sean B. Holden, Pietro Li\`o
- Abstract summary: We propose a novel framework for embedding protein graphs in geometric vector spaces.
We learn an encoder function that preserves the structural distance between protein graphs.
Our framework achieves remarkable results in the task of protein structure classification.
- Score: 0.8258451067861933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Proteins perform much of the work in living organisms, and consequently the
development of efficient computational methods for protein representation is
essential for advancing large-scale biological research. Most current
approaches struggle to efficiently integrate the wealth of information
contained in the protein sequence and structure. In this paper, we propose a
novel framework for embedding protein graphs in geometric vector spaces, by
learning an encoder function that preserves the structural distance between
protein graphs. Utilizing Graph Neural Networks (GNNs) and Large Language
Models (LLMs), the proposed framework generates structure- and sequence-aware
protein representations. We demonstrate that our embeddings are successful in
the task of comparing protein structures, while providing a significant
speed-up compared to traditional approaches based on structural alignment. Our
framework achieves remarkable results in the task of protein structure
classification; in particular, when compared to other work, the proposed method
shows an average F1-Score improvement of 26% on out-of-distribution (OOD)
samples and of 32% when tested on samples coming from the same distribution as
the training data. Our approach finds applications in areas such as drug
prioritization, drug re-purposing, disease sub-type analysis and elsewhere.
Related papers
- Protein Representation Learning with Sequence Information Embedding: Does it Always Lead to a Better Performance? [4.7077642423577775]
We propose ProtLOCA, a local geometry alignment method based solely on amino acid structure representation.
Our method outperforms existing sequence- and structure-based representation learning methods by more quickly and accurately matching structurally consistent protein domains.
arXiv Detail & Related papers (2024-06-28T08:54:37Z) - Geometric Self-Supervised Pretraining on 3D Protein Structures using Subgraphs [25.93347924265175]
We propose a novel self-supervised method to pretrain 3D graph neural networks on 3D protein structures.
By considering subgraphs and their relationships to the global protein structure, the model can learn to reason about these hierarchical levels of organization.
arXiv Detail & Related papers (2024-06-20T09:34:31Z) - NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks [60.48306899271866]
We propose novel semantic data augmentation methods to incorporate backbone chemical and side-chain biophysical information into protein classification tasks.
Specifically, we leverage molecular biophysical, secondary structure, chemical bonds, andionic features of proteins to facilitate classification tasks.
arXiv Detail & Related papers (2024-03-21T13:27:57Z) - A Systematic Study of Joint Representation Learning on Protein Sequences
and Structures [38.94729758958265]
Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein functions.
Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge.
Our study undertakes a comprehensive exploration of joint protein representation learning by integrating a state-of-the-art PLM with distinct structure encoders.
arXiv Detail & Related papers (2023-03-11T01:24:10Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - Integration of Pre-trained Protein Language Models into Geometric Deep
Learning Networks [68.90692290665648]
We integrate knowledge learned by protein language models into several state-of-the-art geometric networks.
Our findings show an overall improvement of 20% over baselines.
Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin.
arXiv Detail & Related papers (2022-12-07T04:04:04Z) - Contrastive Representation Learning for 3D Protein Structures [13.581113136149469]
We introduce a new representation learning framework for 3D protein structures.
Our framework uses unsupervised contrastive learning to learn meaningful representations of protein structures.
We show, how these representations can be used to solve a large variety of tasks, such as protein function prediction, protein fold classification, structural similarity prediction, and protein-ligand binding affinity prediction.
arXiv Detail & Related papers (2022-05-31T10:33:06Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Structure-aware Protein Self-supervised Learning [50.04673179816619]
We propose a novel structure-aware protein self-supervised learning method to capture structural information of proteins.
In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information.
We identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme.
arXiv Detail & Related papers (2022-04-06T02:18:41Z) - PersGNN: Applying Topological Data Analysis and Geometric Deep Learning
to Structure-Based Protein Function Prediction [0.07340017786387766]
In this work, we isolate protein structure to make functional annotations for proteins in the Protein Data Bank.
We present PersGNN - an end-to-end trainable deep learning model that combines graph representation learning with topological data analysis.
arXiv Detail & Related papers (2020-10-30T02:24:35Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.