Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design
- URL: http://arxiv.org/abs/2402.04997v2
- Date: Wed, 5 Jun 2024 20:31:17 GMT
- Title: Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design
- Authors: Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, Tommi Jaakkola,
- Abstract summary: We present a new flow-based model of discrete data that provides the missing link in enabling flow-based generative models.
Our key insight is that the discrete equivalent of continuous space flow matching can be realized using Continuous Time Markov Chains.
We apply this capability to the task of protein co-design, wherein we learn a model for jointly generating protein structure and sequence.
- Score: 37.634098563033795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Combining discrete and continuous data is an important capability for generative models. We present Discrete Flow Models (DFMs), a new flow-based model of discrete data that provides the missing link in enabling flow-based generative models to be applied to multimodal continuous and discrete data problems. Our key insight is that the discrete equivalent of continuous space flow matching can be realized using Continuous Time Markov Chains. DFMs benefit from a simple derivation that includes discrete diffusion models as a specific instance while allowing improved performance over existing diffusion-based approaches. We utilize our DFMs method to build a multimodal flow-based modeling framework. We apply this capability to the task of protein co-design, wherein we learn a model for jointly generating protein structure and sequence. Our approach achieves state-of-the-art co-design performance while allowing the same multimodal model to be used for flexible generation of the sequence or structure.
Related papers
- $α$-Flow: A Unified Framework for Continuous-State Discrete Flow Matching Models [8.705749038874137]
This work presents a unified framework for Continuous-State Discrete Flow Matching models.
We introduce $alpha$-Flow, a family of CS-DFM models that adheres to the canonical $alpha$-geometry of the statistical manifold.
We show that the flow matching loss for $alpha$-flow establishes a unified variational bound for the discrete negative log-likelihood.
arXiv Detail & Related papers (2025-04-14T14:51:45Z) - Continuous Diffusion Model for Language Modeling [57.396578974401734]
Existing continuous diffusion models for discrete data have limited performance compared to discrete approaches.
We propose a continuous diffusion model for language modeling that incorporates the geometry of the underlying categorical distribution.
arXiv Detail & Related papers (2025-02-17T08:54:29Z) - TFG-Flow: Training-free Guidance in Multimodal Generative Flow [73.93071065307782]
We introduce TFG-Flow, a training-free guidance method for multimodal generative flow.
TFG-Flow addresses the curse-of-dimensionality while maintaining the property of unbiased sampling in guiding discrete variables.
We show that TFG-Flow has great potential in drug design by generating molecules with desired properties.
arXiv Detail & Related papers (2025-01-24T03:44:16Z) - Distillation of Discrete Diffusion through Dimensional Correlations [21.078500510691747]
"Mixture" models in discrete diffusion are capable of treating dimensional correlations while remaining scalable.
We empirically demonstrate that our proposed method for discrete diffusions work in practice, by distilling a continuous-time discrete diffusion model pretrained on the CIFAR-10 dataset.
arXiv Detail & Related papers (2024-10-11T10:53:03Z) - Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Local Flow Matching Generative Models [19.859984725284896]
Flow Matching (FM) is a simulation-free method for learning a continuous and invertible flow to interpolate between two distributions.
We introduce Local Flow Matching (LFM), which learns a sequence of FM sub-models and each matches a diffusion process up to the time of the step size in the data-to-noise direction.
In experiments, we demonstrate the improved training efficiency and competitive generative performance of LFM compared to FM.
arXiv Detail & Related papers (2024-10-03T14:53:10Z) - On the Trajectory Regularity of ODE-based Diffusion Sampling [79.17334230868693]
Diffusion-based generative models use differential equations to establish a smooth connection between a complex data distribution and a tractable prior distribution.
In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models.
arXiv Detail & Related papers (2024-05-18T15:59:41Z) - Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling [2.1779479916071067]
We introduce a novel framework that enhances diffusion models by supporting a broader range of forward processes.
We also propose a novel parameterization technique for learning the forward process.
Results underscore NFDM's versatility and its potential for a wide range of applications.
arXiv Detail & Related papers (2024-04-19T15:10:54Z) - Convergence Analysis of Discrete Diffusion Model: Exact Implementation
through Uniformization [17.535229185525353]
We introduce an algorithm leveraging the uniformization of continuous Markov chains, implementing transitions on random time points.
Our results align with state-of-the-art achievements for diffusion models in $mathbbRd$ and further underscore the advantages of discrete diffusion models in comparison to the $mathbbRd$ setting.
arXiv Detail & Related papers (2024-02-12T22:26:52Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Normalizing Flows with Multi-Scale Autoregressive Priors [131.895570212956]
We introduce channel-wise dependencies in their latent space through multi-scale autoregressive priors (mAR)
Our mAR prior for models with split coupling flow layers (mAR-SCF) can better capture dependencies in complex multimodal data.
We show that mAR-SCF allows for improved image generation quality, with gains in FID and Inception scores compared to state-of-the-art flow-based models.
arXiv Detail & Related papers (2020-04-08T09:07:11Z) - Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows [40.9137348900942]
We propose a novel type of flow driven by a differential deformation of the Wiener process.
As a result, we obtain a rich time series model whose observable process inherits many of the appealing properties of its base process.
arXiv Detail & Related papers (2020-02-24T20:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.