SE(3)-Stochastic Flow Matching for Protein Backbone Generation
- URL: http://arxiv.org/abs/2310.02391v4
- Date: Thu, 11 Apr 2024 16:29:12 GMT
- Title: SE(3)-Stochastic Flow Matching for Protein Backbone Generation
- Authors: Avishek Joey Bose, Tara Akhound-Sadegh, Guillaume Huguet, Kilian Fatras, Jarrid Rector-Brooks, Cheng-Hao Liu, Andrei Cristian Nica, Maksym Korablyov, Michael Bronstein, Alexander Tong,
- Abstract summary: We introduce FoldFlow, a series of novel generative models of increasing modeling power based on the flow-matching paradigm over $3mathrmD$ rigid motions.
Our family of FoldFlowgenerative models offers several advantages over previous approaches to the generative modeling of proteins.
- Score: 54.951832422425454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The computational design of novel protein structures has the potential to impact numerous scientific disciplines greatly. Toward this goal, we introduce FoldFlow, a series of novel generative models of increasing modeling power based on the flow-matching paradigm over $3\mathrm{D}$ rigid motions -- i.e. the group $\text{SE}(3)$ -- enabling accurate modeling of protein backbones. We first introduce FoldFlow-Base, a simulation-free approach to learning deterministic continuous-time dynamics and matching invariant target distributions on $\text{SE}(3)$. We next accelerate training by incorporating Riemannian optimal transport to create FoldFlow-OT, leading to the construction of both more simple and stable flows. Finally, we design FoldFlow-SFM, coupling both Riemannian OT and simulation-free training to learn stochastic continuous-time dynamics over $\text{SE}(3)$. Our family of FoldFlow, generative models offers several key advantages over previous approaches to the generative modeling of proteins: they are more stable and faster to train than diffusion-based approaches, and our models enjoy the ability to map any invariant source distribution to any invariant target distribution over $\text{SE}(3)$. Empirically, we validate FoldFlow, on protein backbone generation of up to $300$ amino acids leading to high-quality designable, diverse, and novel samples.
Related papers
- P2DFlow: A Protein Ensemble Generative Model with SE(3) Flow Matching [8.620021796568087]
P2DFlow is a generative model based on SE(3) flow matching to predict the structural ensembles of proteins.
When trained and evaluated on MD datasets of ATLAS, P2DFlow outperforms other baseline models.
As a potential proxy agent for protein molecular simulation, the high-quality ensembles generated by P2DFlow could significantly aid in understanding protein functions across various scenarios.
arXiv Detail & Related papers (2024-11-26T08:10:12Z) - Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models [35.40065954148091]
FINE is a method based on the Learngene framework to initializing downstream networks leveraging pre-trained models.
It decomposes pre-trained knowledge into the product of matrices (i.e., $U$, $Sigma$, and $V$), where $U$ and $V$ are shared across network blocks as learngenes''
It consistently outperforms direct pre-training, particularly for smaller models, achieving state-of-the-art results across variable model sizes.
arXiv Detail & Related papers (2024-09-28T08:57:17Z) - Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences.
We aim to optimize downstream reward functions while preserving the naturalness of these design spaces.
Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z) - Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation [55.93511121486321]
We introduce FoldFlow-2, a novel sequence-conditioned flow matching model for protein structure generation.
We train FoldFlow-2 at scale on a new dataset that is an order of magnitude larger than PDB datasets of prior works.
We empirically observe that FoldFlow-2 outperforms previous state-of-the-art protein structure-based generative models.
arXiv Detail & Related papers (2024-05-30T17:53:50Z) - Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - F$^3$low: Frame-to-Frame Coarse-grained Molecular Dynamics with SE(3) Guided Flow Matching [43.607506885746155]
We propose a generative model with guided underlineFlow-matching (F$3$low) for enhanced sampling.
The ability to rapidly generate diverse conformations via force-free generative paradigm on SE(3) paves the way toward efficient enhanced sampling methods.
arXiv Detail & Related papers (2024-05-01T04:53:14Z) - AlphaFold Meets Flow Matching for Generating Protein Ensembles [11.1639408863378]
We develop a flow-based generative modeling approach for learning and sampling the conformational landscapes of proteins.
Our method provides a superior combination of precision and diversity compared to AlphaFold with MSA subsampling.
Our method can diversify a static PDB structure with faster wall-clock convergence to certain equilibrium properties than replicate MD trajectories.
arXiv Detail & Related papers (2024-02-07T13:44:47Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.