Contrastive Representation Learning for 3D Protein Structures
- URL: http://arxiv.org/abs/2205.15675v1
- Date: Tue, 31 May 2022 10:33:06 GMT
- Title: Contrastive Representation Learning for 3D Protein Structures
- Authors: Pedro Hermosilla and Timo Ropinski
- Abstract summary: We introduce a new representation learning framework for 3D protein structures.
Our framework uses unsupervised contrastive learning to learn meaningful representations of protein structures.
We show, how these representations can be used to solve a large variety of tasks, such as protein function prediction, protein fold classification, structural similarity prediction, and protein-ligand binding affinity prediction.
- Score: 13.581113136149469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning from 3D protein structures has gained wide interest in protein
modeling and structural bioinformatics. Unfortunately, the number of available
structures is orders of magnitude lower than the training data sizes commonly
used in computer vision and machine learning. Moreover, this number is reduced
even further, when only annotated protein structures can be considered, making
the training of existing models difficult and prone to over-fitting. To address
this challenge, we introduce a new representation learning framework for 3D
protein structures. Our framework uses unsupervised contrastive learning to
learn meaningful representations of protein structures, making use of proteins
from the Protein Data Bank. We show, how these representations can be used to
solve a large variety of tasks, such as protein function prediction, protein
fold classification, structural similarity prediction, and protein-ligand
binding affinity prediction. Moreover, we show how fine-tuned networks,
pre-trained with our algorithm, lead to significantly improved task
performance, achieving new state-of-the-art results in many tasks.
Related papers
- Geometric Self-Supervised Pretraining on 3D Protein Structures using Subgraphs [26.727436310732692]
We propose a novel self-supervised method to pretrain 3D graph neural networks on 3D protein structures.
We experimentally show that our proposed pertaining strategy leads to significant improvements up to 6%.
arXiv Detail & Related papers (2024-06-20T09:34:31Z) - Protein 3D Graph Structure Learning for Robust Structure-based Protein
Property Prediction [43.46012602267272]
Protein structure-based property prediction has emerged as a promising approach for various biological tasks.
Current practices, which simply employ accurately predicted structures during inference, suffer from notable degradation in prediction accuracy.
Our framework is model-agnostic and effective in improving the property prediction of both predicted structures and experimental structures.
arXiv Detail & Related papers (2023-10-14T08:43:42Z) - CCPL: Cross-modal Contrastive Protein Learning [47.095862120116976]
We introduce a novel unsupervised protein structure representation pretraining method, cross-modal contrastive protein learning (CCPL)
CCPL leverages a robust protein language model and uses unsupervised contrastive alignment to enhance structure learning.
We evaluated our model across various benchmarks, demonstrating the framework's superiority.
arXiv Detail & Related papers (2023-03-19T08:19:10Z) - A Systematic Study of Joint Representation Learning on Protein Sequences
and Structures [38.94729758958265]
Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein functions.
Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge.
Our study undertakes a comprehensive exploration of joint protein representation learning by integrating a state-of-the-art PLM with distinct structure encoders.
arXiv Detail & Related papers (2023-03-11T01:24:10Z) - Data-Efficient Protein 3D Geometric Pretraining via Refinement of
Diffused Protein Structure Decoy [42.49977473599661]
Learning meaningful protein representation is important for a variety of biological downstream tasks such as structure-based drug design.
In this paper, we propose a unified framework for protein pretraining and a 3D geometric-based, data-efficient, and protein-specific pretext task: RefineDiff.
arXiv Detail & Related papers (2023-02-05T14:13:32Z) - Integration of Pre-trained Protein Language Models into Geometric Deep
Learning Networks [68.90692290665648]
We integrate knowledge learned by protein language models into several state-of-the-art geometric networks.
Our findings show an overall improvement of 20% over baselines.
Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin.
arXiv Detail & Related papers (2022-12-07T04:04:04Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Structure-aware Protein Self-supervised Learning [50.04673179816619]
We propose a novel structure-aware protein self-supervised learning method to capture structural information of proteins.
In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information.
We identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme.
arXiv Detail & Related papers (2022-04-06T02:18:41Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z) - BERTology Meets Biology: Interpreting Attention in Protein Language
Models [124.8966298974842]
We demonstrate methods for analyzing protein Transformer models through the lens of attention.
We show that attention captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure.
We also present a three-dimensional visualization of the interaction between attention and protein structure.
arXiv Detail & Related papers (2020-06-26T21:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.