Dirichlet Flow Matching with Applications to DNA Sequence Design
- URL: http://arxiv.org/abs/2402.05841v2
- Date: Thu, 30 May 2024 19:09:41 GMT
- Title: Dirichlet Flow Matching with Applications to DNA Sequence Design
- Authors: Hannes Stark, Bowen Jing, Chenyu Wang, Gabriele Corso, Bonnie Berger, Regina Barzilay, Tommi Jaakkola,
- Abstract summary: We develop Dirichlet flow matching on the simplex based on mixtures of Dirichlet distributions as probability paths.
We provide distilled Dirichlet flow matching, which enables one-step sequence generation with minimal performance hits.
- Score: 37.12809686044779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discrete diffusion or flow models could enable faster and more controllable sequence generation than autoregressive models. We show that na\"ive linear flow matching on the simplex is insufficient toward this goal since it suffers from discontinuities in the training target and further pathologies. To overcome this, we develop Dirichlet flow matching on the simplex based on mixtures of Dirichlet distributions as probability paths. In this framework, we derive a connection between the mixtures' scores and the flow's vector field that allows for classifier and classifier-free guidance. Further, we provide distilled Dirichlet flow matching, which enables one-step sequence generation with minimal performance hits, resulting in $O(L)$ speedups compared to autoregressive models. On complex DNA sequence generation tasks, we demonstrate superior performance compared to all baselines in distributional metrics and in achieving desired design targets for generated sequences. Finally, we show that our classifier-free guidance approach improves unconditional generation and is effective for generating DNA that satisfies design targets. Code is available at https://github.com/HannesStark/dirichlet-flow-matching.
Related papers
- Learning Straight Flows by Learning Curved Interpolants [19.42604535211923]
Flow matching models typically use linear interpolants to define the forward/noise addition process.
This, together with the independent coupling between noise and target distributions, yields a vector field which is often non-straight.
We propose to learn flexible (potentially curved) interpolants in order to learn straight vector fields to enable faster generation.
arXiv Detail & Related papers (2025-03-26T16:54:56Z) - Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation [45.105452288011726]
Flow matching in the continuous simplex has emerged as a promising strategy for DNA sequence design, but struggles to scale to higher simplex dimensions required for protein generation.
We introduce Gumbel-Softmax Flow and Score Matching, a generative framework on the simplex based on a novel Gumbel-Softmax interpolant with a time-dependent temperature.
Our framework enables high-quality, diverse generation and scales efficiently to higher-dimensional simplices.
arXiv Detail & Related papers (2025-03-21T17:59:43Z) - Block Flow: Learning Straight Flow on Data Blocks [0.0]
Flow-matching models are characterized by flows with low curvature in learned generative trajectories.
We propose block matching to further reduce curvature.
We demonstrate that the variance of the prior distribution can control the curvature upper bound of forward trajectories.
arXiv Detail & Related papers (2025-01-20T09:46:12Z) - Integrating Geodesic Interpolation and Flow Matching for Non-Autoregressive Text Generation in Logit Space [4.347494885647007]
Non-autoregressive language models are emerging as effective alternatives to autoregressive models in the field of natural language processing.
This study introduces a novel flow matching approach that employs Kullback-Leibler divergence geodesics to interpolate between initial and target distributions for discrete sequences.
arXiv Detail & Related papers (2024-11-25T17:15:41Z) - Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow [65.51671121528858]
Diffusion models have greatly improved visual generation but are hindered by slow generation speed due to the computationally intensive nature of solving generative ODEs.
Rectified flow, a widely recognized solution, improves generation speed by straightening the ODE path.
We propose Rectified Diffusion, which generalizes the design space and application scope of rectification to encompass the broader category of diffusion models.
arXiv Detail & Related papers (2024-10-09T17:43:38Z) - Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences.
We aim to optimize downstream reward functions while preserving the naturalness of these design spaces.
Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Gaussianization Flows [113.79542218282282]
We propose a new type of normalizing flow model that enables both efficient iteration of likelihoods and efficient inversion for sample generation.
Because of this guaranteed expressivity, they can capture multimodal target distributions without compromising the efficiency of sample generation.
arXiv Detail & Related papers (2020-03-04T08:15:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.