pLDDT-Predictor: High-speed Protein Screening Using Transformer and ESM2
- URL: http://arxiv.org/abs/2410.21283v1
- Date: Fri, 11 Oct 2024 03:19:44 GMT
- Title: pLDDT-Predictor: High-speed Protein Screening Using Transformer and ESM2
- Authors: Joongwon Chae, Zhenyu Wang, Peiwu Qin,
- Abstract summary: We introduce pLDDT-Predictor, a high-speed protein screening tool that bridges the gap by leveraging pre-trained ESM2 protein embeddings and a Transformer architecture.
Our experimental results, conducted on a diverse dataset of 1.5 million protein sequences, demonstrate that pLDDT-Predictor can classify more than 90 percent of proteins with a pLDDT score above 70.
- Score: 4.930667479611019
- License:
- Abstract: Recent advancements in protein structure prediction, particularly AlphaFold2, have revolutionized structural biology by achieving near-experimental accuracy. However, the computational intensity of these models limits their application in high-throughput protein screening. Concurrently, large language models like ESM (Evolutionary Scale Modeling) have demonstrated the potential to extract rich structural information directly from protein sequences. Despite these advances, a significant gap remains in rapidly assessing protein structure quality for large-scale analyses. We introduce pLDDT-Predictor, a high-speed protein screening tool that bridges this gap by leveraging pre-trained ESM2 protein embeddings and a Transformer architecture to accurately predict AlphaFold2`s pLDDT (predicted Local Distance Difference Test) scores. Our model addresses the critical need for fast, accurate protein structure quality assessment without the computational burden of full structure prediction. By combining the evolutionary information captured in ESM2 embeddings with the sequence-wide context modeling of Transformers, pLDDT-Predictor achieves a balance between structural insight and computational efficiency. Our experimental results, conducted on a diverse dataset of 1.5 million protein sequences, demonstrate that pLDDT-Predictor can classify more than 90 percent of proteins with a pLDDT score above 70, closely matching AlphaFold2`s confidence level.
Related papers
- PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for
Efficient and Generalizable Compound-Protein Interaction Prediction [63.50967073653953]
Compound-Protein Interaction prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.
Existing deep learning-based methods utilize only the single modality of protein sequences or structures.
We propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction.
arXiv Detail & Related papers (2024-02-13T03:51:10Z) - Endowing Protein Language Models with Structural Knowledge [5.587293092389789]
We introduce a novel framework that enhances protein language models by integrating protein structural data.
The refined model, termed Protein Structure Transformer (PST), is further pretrained on a small protein structure database.
PST consistently outperforms the state-of-the-art foundation model for protein sequences, ESM-2, setting a new benchmark in protein function prediction.
arXiv Detail & Related papers (2024-01-26T12:47:54Z) - Efficiently Predicting Protein Stability Changes Upon Single-point
Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry.
We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z) - Enhancing the Protein Tertiary Structure Prediction by Multiple Sequence
Alignment Generation [30.2874172276931]
We introduce MSA-Augmenter, which generates useful, novel protein sequences not currently found in databases.
Our experiments on CASP14 demonstrate that MSA-Augmenter can generate de novo sequences that retain co-evolutionary information from inferior MSAs.
arXiv Detail & Related papers (2023-06-02T14:13:50Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - On the Robustness of AlphaFold: A COVID-19 Case Study [16.564151738086434]
We demonstrate that AlphaFold does not exhibit such robustness despite its high accuracy.
This raises the challenge of detecting and quantifying the extent to which these predicted protein structures can be trusted.
arXiv Detail & Related papers (2023-01-10T17:31:39Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.