AlphaFold Meets Flow Matching for Generating Protein Ensembles
- URL: http://arxiv.org/abs/2402.04845v2
- Date: Mon, 2 Sep 2024 22:43:33 GMT
- Title: AlphaFold Meets Flow Matching for Generating Protein Ensembles
- Authors: Bowen Jing, Bonnie Berger, Tommi Jaakkola,
- Abstract summary: We develop a flow-based generative modeling approach for learning and sampling the conformational landscapes of proteins.
Our method provides a superior combination of precision and diversity compared to AlphaFold with MSA subsampling.
Our method can diversify a static PDB structure with faster wall-clock convergence to certain equilibrium properties than replicate MD trajectories.
- Score: 11.1639408863378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The biological functions of proteins often depend on dynamic structural ensembles. In this work, we develop a flow-based generative modeling approach for learning and sampling the conformational landscapes of proteins. We repurpose highly accurate single-state predictors such as AlphaFold and ESMFold and fine-tune them under a custom flow matching framework to obtain sequence-conditoned generative models of protein structure called AlphaFlow and ESMFlow. When trained and evaluated on the PDB, our method provides a superior combination of precision and diversity compared to AlphaFold with MSA subsampling. When further trained on ensembles from all-atom MD, our method accurately captures conformational flexibility, positional distributions, and higher-order ensemble observables for unseen proteins. Moreover, our method can diversify a static PDB structure with faster wall-clock convergence to certain equilibrium properties than replicate MD trajectories, demonstrating its potential as a proxy for expensive physics-based simulations. Code is available at https://github.com/bjing2016/alphaflow.
Related papers
- P2DFlow: A Protein Ensemble Generative Model with SE(3) Flow Matching [8.620021796568087]
P2DFlow is a generative model based on SE(3) flow matching to predict the structural ensembles of proteins.
When trained and evaluated on MD datasets of ATLAS, P2DFlow outperforms other baseline models.
As a potential proxy agent for protein molecular simulation, the high-quality ensembles generated by P2DFlow could significantly aid in understanding protein functions across various scenarios.
arXiv Detail & Related papers (2024-11-26T08:10:12Z) - Structure Language Models for Protein Conformation Generation [66.42864253026053]
Traditional physics-based simulation methods often struggle with sampling equilibrium conformations.
Deep generative models have shown promise in generating protein conformations as a more efficient alternative.
We introduce Structure Language Modeling as a novel framework for efficient protein conformation generation.
arXiv Detail & Related papers (2024-10-24T03:38:51Z) - Improving AlphaFlow for Efficient Protein Ensembles Generation [64.10918970280603]
We propose a feature-conditioned generative model called AlphaFlow-Lit to realize efficient protein ensembles generation.
AlphaFlow-Lit performs on-par with AlphaFlow and surpasses its distilled version without pretraining, all while achieving a significant sampling acceleration of around 47 times.
arXiv Detail & Related papers (2024-07-08T13:36:43Z) - Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation [55.93511121486321]
We introduce FoldFlow-2, a novel sequence-conditioned flow matching model for protein structure generation.
We train FoldFlow-2 at scale on a new dataset that is an order of magnitude larger than PDB datasets of prior works.
We empirically observe that FoldFlow-2 outperforms previous state-of-the-art protein structure-based generative models.
arXiv Detail & Related papers (2024-05-30T17:53:50Z) - SE(3)-Stochastic Flow Matching for Protein Backbone Generation [54.951832422425454]
We introduce FoldFlow, a series of novel generative models of increasing modeling power based on the flow-matching paradigm over $3mathrmD$ rigid motions.
Our family of FoldFlowgenerative models offers several advantages over previous approaches to the generative modeling of proteins.
arXiv Detail & Related papers (2023-10-03T19:24:24Z) - EigenFold: Generative Protein Structure Prediction with Diffusion Models [10.24107243529341]
EigenFold is a diffusion generative modeling framework for sampling a distribution of structures from a given protein sequence.
On recent CAMEO targets, EigenFold achieves a median TMScore of 0.84, while providing a more comprehensive picture of model uncertainty.
arXiv Detail & Related papers (2023-04-05T02:46:13Z) - AlphaFold Distillation for Protein Design [25.190210443632825]
Inverse protein folding is crucial in bio-engineering and drug discovery.
Forward folding models like AlphaFold offer a potential solution by accurately predicting structures from sequences.
We propose using knowledge distillation on folding model confidence metrics to create a faster and end-to-end differentiable distilled model.
arXiv Detail & Related papers (2022-10-05T19:43:06Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.