Beating the Best: Improving on AlphaFold2 at Protein Structure
Prediction
- URL: http://arxiv.org/abs/2301.07568v1
- Date: Wed, 18 Jan 2023 14:39:34 GMT
- Title: Beating the Best: Improving on AlphaFold2 at Protein Structure
Prediction
- Authors: Abbi Abdel-Rehim, Oghenejokpeme Orhobor, Hang Lou, Hao Ni and Ross D.
King
- Abstract summary: ARStack significantly outperforms AlphaFold2 and RoseTTAFold.
We rigorously demonstrate this using two sets of non-homologous proteins, and a test set of protein structures published after that of AlphaFold2 and RoseTTAFold.
- Score: 1.3124513975412255
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The goal of Protein Structure Prediction (PSP) problem is to predict a
protein's 3D structure (confirmation) from its amino acid sequence. The problem
has been a 'holy grail' of science since the Noble prize-winning work of
Anfinsen demonstrated that protein conformation was determined by sequence. A
recent and important step towards this goal was the development of AlphaFold2,
currently the best PSP method. AlphaFold2 is probably the highest profile
application of AI to science. Both AlphaFold2 and RoseTTAFold (another
impressive PSP method) have been published and placed in the public domain
(code & models). Stacking is a form of ensemble machine learning ML in which
multiple baseline models are first learnt, then a meta-model is learnt using
the outputs of the baseline level model to form a model that outperforms the
base models. Stacking has been successful in many applications. We developed
the ARStack PSP method by stacking AlphaFold2 and RoseTTAFold. ARStack
significantly outperforms AlphaFold2. We rigorously demonstrate this using two
sets of non-homologous proteins, and a test set of protein structures published
after that of AlphaFold2 and RoseTTAFold. As more high quality prediction
methods are published it is likely that ensemble methods will increasingly
outperform any single method.
Related papers
- Improving AlphaFlow for Efficient Protein Ensembles Generation [64.10918970280603]
We propose a feature-conditioned generative model called AlphaFlow-Lit to realize efficient protein ensembles generation.
AlphaFlow-Lit performs on-par with AlphaFlow and surpasses its distilled version without pretraining, all while achieving a significant sampling acceleration of around 47 times.
arXiv Detail & Related papers (2024-07-08T13:36:43Z) - Diffusion Language Models Are Versatile Protein Learners [75.98083311705182]
This paper introduces diffusion protein language model (DPLM), a versatile protein language model that demonstrates strong generative and predictive capabilities for protein sequences.
We first pre-train scalable DPLMs from evolutionary-scale protein sequences within a generative self-supervised discrete diffusion probabilistic framework.
After pre-training, DPLM exhibits the ability to generate structurally plausible, novel, and diverse protein sequences for unconditional generation.
arXiv Detail & Related papers (2024-02-28T18:57:56Z) - xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering
the Language of Protein [76.18058946124111]
We propose a unified protein language model, xTrimoPGLM, to address protein understanding and generation tasks simultaneously.
xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories.
It can also generate de novo protein sequences following the principles of natural ones, and can perform programmable generation after supervised fine-tuning.
arXiv Detail & Related papers (2024-01-11T15:03:17Z) - DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models [47.73386438748902]
DiffDock-PP is a diffusion generative model that learns to translate and rotate unbound protein structures into their bound conformations.
We achieve state-of-the-art performance on DIPS with a median C-RMSD of 4.85, outperforming all considered baselines.
arXiv Detail & Related papers (2023-04-08T02:10:44Z) - Retrieved Sequence Augmentation for Protein Representation Learning [40.13920287967866]
We introduce Retrieved Sequence Augmentation for protein representation learning without additional alignment or pre-processing.
We show that our model can transfer to new protein domains better and outperforms MSA Transformer on de novo protein prediction.
Our study fills a much-encountered gap in protein prediction and brings us a step closer to demystifying the domain knowledge needed to understand protein sequences.
arXiv Detail & Related papers (2023-02-24T10:31:45Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - AlphaFold Distillation for Protein Design [25.190210443632825]
Inverse protein folding is crucial in bio-engineering and drug discovery.
Forward folding models like AlphaFold offer a potential solution by accurately predicting structures from sequences.
We propose using knowledge distillation on folding model confidence metrics to create a faster and end-to-end differentiable distilled model.
arXiv Detail & Related papers (2022-10-05T19:43:06Z) - Unsupervisedly Prompting AlphaFold2 for Few-Shot Learning of Accurate
Folding Landscape and Protein Structure Prediction [28.630603355510324]
We present EvoGen, a meta generative model, to remedy the underperformance of AlphaFold2 for poor MSA targets.
By prompting the model with calibrated or virtually generated homologue sequences, EvoGen helps AlphaFold2 fold accurately in low-data regime.
arXiv Detail & Related papers (2022-08-20T10:23:17Z) - HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein
Language Model as an Alternative [61.984700682903096]
HelixFold-Single is proposed to combine a large-scale protein language model with the superior geometric learning capability of AlphaFold2.
Our proposed method pre-trains a large-scale protein language model with thousands of millions of primary sequences.
We obtain an end-to-end differentiable model to predict the 3D coordinates of atoms from only the primary sequence.
arXiv Detail & Related papers (2022-07-28T07:30:33Z) - Exploring evolution-based & -free protein language models as protein
function predictors [12.381080613343306]
Large-scale Protein Language Models (PLMs) have improved performance in protein prediction tasks.
We investigate the representation ability of three popular PLMs: ESM-1b (single sequence), MSA-Transformer (multiple sequence alignment) and Evoformer (structural)
Specifically, we aim to answer the following key questions: (i) Does the Evoformer trained as part of AlphaFold produce representations amenable to predicting protein function?
We compare these models by empirical study along with new insights and conclusions.
arXiv Detail & Related papers (2022-06-14T03:56:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.