Protein 3D Graph Structure Learning for Robust Structure-based Protein
Property Prediction
- URL: http://arxiv.org/abs/2310.11466v2
- Date: Thu, 19 Oct 2023 06:21:10 GMT
- Title: Protein 3D Graph Structure Learning for Robust Structure-based Protein
Property Prediction
- Authors: Yufei Huang, Siyuan Li, Jin Su, Lirong Wu, Odin Zhang, Haitao Lin,
Jingqi Qi, Zihan Liu, Zhangyang Gao, Yuyang Liu, Jiangbin Zheng, Stan.ZQ.Li
- Abstract summary: Protein structure-based property prediction has emerged as a promising approach for various biological tasks.
Current practices, which simply employ accurately predicted structures during inference, suffer from notable degradation in prediction accuracy.
Our framework is model-agnostic and effective in improving the property prediction of both predicted structures and experimental structures.
- Score: 43.46012602267272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Protein structure-based property prediction has emerged as a promising
approach for various biological tasks, such as protein function prediction and
sub-cellular location estimation. The existing methods highly rely on
experimental protein structure data and fail in scenarios where these data are
unavailable. Predicted protein structures from AI tools (e.g., AlphaFold2) were
utilized as alternatives. However, we observed that current practices, which
simply employ accurately predicted structures during inference, suffer from
notable degradation in prediction accuracy. While similar phenomena have been
extensively studied in general fields (e.g., Computer Vision) as model
robustness, their impact on protein property prediction remains unexplored. In
this paper, we first investigate the reason behind the performance decrease
when utilizing predicted structures, attributing it to the structure embedding
bias from the perspective of structure representation learning. To study this
problem, we identify a Protein 3D Graph Structure Learning Problem for Robust
Protein Property Prediction (PGSL-RP3), collect benchmark datasets, and present
a protein Structure embedding Alignment Optimization framework (SAO) to
mitigate the problem of structure embedding bias between the predicted and
experimental protein structures. Extensive experiments have shown that our
framework is model-agnostic and effective in improving the property prediction
of both predicted structures and experimental structures. The benchmark
datasets and codes will be released to benefit the community.
Related papers
- CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation [7.161099050722313]
We develop a structure-sensitive supervised deep learning model, Crystal vs Predicted Evaluator for Protein Structure (CPE-Pro)
CPE-Pro learns the structural information of proteins and captures inter-structural differences to achieve accurate traceability on four data classes.
We utilize Foldseek to encode protein structures into "structure-sequences" and trained a protein Structural Sequence Language Model, SSLM.
arXiv Detail & Related papers (2024-10-21T02:21:56Z) - A Protein Structure Prediction Approach Leveraging Transformer and CNN
Integration [4.909112037834705]
This paper adopts a two-dimensional fusion deep neural network model, DstruCCN, which uses Convolutional Neural Networks (CCN) and a supervised Transformer protein language model for single-sequence protein structure prediction.
The training features of the two are combined to predict the protein Transformer binding site matrix, and then the three-dimensional structure is reconstructed using energy minimization.
arXiv Detail & Related papers (2024-02-29T12:24:20Z) - Structure-Informed Protein Language Model [38.019425619750265]
We introduce the integration of remote homology detection to distill structural information into protein language models.
We evaluate the impact of this structure-informed training on downstream protein function prediction tasks.
arXiv Detail & Related papers (2024-02-07T09:32:35Z) - CCPL: Cross-modal Contrastive Protein Learning [47.095862120116976]
We introduce a novel unsupervised protein structure representation pretraining method, cross-modal contrastive protein learning (CCPL)
CCPL leverages a robust protein language model and uses unsupervised contrastive alignment to enhance structure learning.
We evaluated our model across various benchmarks, demonstrating the framework's superiority.
arXiv Detail & Related papers (2023-03-19T08:19:10Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Contrastive Representation Learning for 3D Protein Structures [13.581113136149469]
We introduce a new representation learning framework for 3D protein structures.
Our framework uses unsupervised contrastive learning to learn meaningful representations of protein structures.
We show, how these representations can be used to solve a large variety of tasks, such as protein function prediction, protein fold classification, structural similarity prediction, and protein-ligand binding affinity prediction.
arXiv Detail & Related papers (2022-05-31T10:33:06Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.