Enhancing the Protein Tertiary Structure Prediction by Multiple Sequence
Alignment Generation
- URL: http://arxiv.org/abs/2306.01824v1
- Date: Fri, 2 Jun 2023 14:13:50 GMT
- Title: Enhancing the Protein Tertiary Structure Prediction by Multiple Sequence
Alignment Generation
- Authors: Le Zhang, Jiayang Chen, Tao Shen, Yu Li, Siqi Sun
- Abstract summary: We introduce MSA-Augmenter, which generates useful, novel protein sequences not currently found in databases.
Our experiments on CASP14 demonstrate that MSA-Augmenter can generate de novo sequences that retain co-evolutionary information from inferior MSAs.
- Score: 30.2874172276931
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The field of protein folding research has been greatly advanced by deep
learning methods, with AlphaFold2 (AF2) demonstrating exceptional performance
and atomic-level precision. As co-evolution is integral to protein structure
prediction, AF2's accuracy is significantly influenced by the depth of multiple
sequence alignment (MSA), which requires extensive exploration of a large
protein database for similar sequences. However, not all protein sequences
possess abundant homologous families, and consequently, AF2's performance can
degrade on such queries, at times failing to produce meaningful results. To
address this, we introduce a novel generative language model, MSA-Augmenter,
which leverages protein-specific attention mechanisms and large-scale MSAs to
generate useful, novel protein sequences not currently found in databases.
These sequences supplement shallow MSAs, enhancing the accuracy of structural
property predictions. Our experiments on CASP14 demonstrate that MSA-Augmenter
can generate de novo sequences that retain co-evolutionary information from
inferior MSAs, thereby improving protein structure prediction quality on top of
strong AF2.
Related papers
- MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training [48.398329286769304]
Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the evolutionary trajectories of protein families.
MSAGPT is a novel approach to prompt protein structure predictions via MSA generative pretraining in the low MSA regime.
arXiv Detail & Related papers (2024-06-08T04:23:57Z) - PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for
Efficient and Generalizable Compound-Protein Interaction Prediction [63.50967073653953]
Compound-Protein Interaction prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.
Existing deep learning-based methods utilize only the single modality of protein sequences or structures.
We propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction.
arXiv Detail & Related papers (2024-02-13T03:51:10Z) - Efficiently Predicting Protein Stability Changes Upon Single-point
Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry.
We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z) - OpenProteinSet: Training data for structural biology at scale [0.0]
Multiple sequence alignments (MSAs) of proteins encode rich biological information.
Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance.
OpenProteinSet is an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions.
arXiv Detail & Related papers (2023-08-10T04:01:04Z) - Retrieved Sequence Augmentation for Protein Representation Learning [40.13920287967866]
We introduce Retrieved Sequence Augmentation for protein representation learning without additional alignment or pre-processing.
We show that our model can transfer to new protein domains better and outperforms MSA Transformer on de novo protein prediction.
Our study fills a much-encountered gap in protein prediction and brings us a step closer to demystifying the domain knowledge needed to understand protein sequences.
arXiv Detail & Related papers (2023-02-24T10:31:45Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - On the Robustness of AlphaFold: A COVID-19 Case Study [16.564151738086434]
We demonstrate that AlphaFold does not exhibit such robustness despite its high accuracy.
This raises the challenge of detecting and quantifying the extent to which these predicted protein structures can be trusted.
arXiv Detail & Related papers (2023-01-10T17:31:39Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Unsupervisedly Prompting AlphaFold2 for Few-Shot Learning of Accurate
Folding Landscape and Protein Structure Prediction [28.630603355510324]
We present EvoGen, a meta generative model, to remedy the underperformance of AlphaFold2 for poor MSA targets.
By prompting the model with calibrated or virtually generated homologue sequences, EvoGen helps AlphaFold2 fold accurately in low-data regime.
arXiv Detail & Related papers (2022-08-20T10:23:17Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.