Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation
- URL: http://arxiv.org/abs/2503.17361v1
- Date: Fri, 21 Mar 2025 17:59:43 GMT
- Title: Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation
- Authors: Sophia Tang, Yinuo Zhang, Alexander Tong, Pranam Chatterjee,
- Abstract summary: Flow matching in the continuous simplex has emerged as a promising strategy for DNA sequence design, but struggles to scale to higher simplex dimensions required for protein generation.<n>We introduce Gumbel-Softmax Flow and Score Matching, a generative framework on the simplex based on a novel Gumbel-Softmax interpolant with a time-dependent temperature.<n>Our framework enables high-quality, diverse generation and scales efficiently to higher-dimensional simplices.
- Score: 45.105452288011726
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Flow matching in the continuous simplex has emerged as a promising strategy for DNA sequence design, but struggles to scale to higher simplex dimensions required for peptide and protein generation. We introduce Gumbel-Softmax Flow and Score Matching, a generative framework on the simplex based on a novel Gumbel-Softmax interpolant with a time-dependent temperature. Using this interpolant, we introduce Gumbel-Softmax Flow Matching by deriving a parameterized velocity field that transports from smooth categorical distributions to distributions concentrated at a single vertex of the simplex. We alternatively present Gumbel-Softmax Score Matching which learns to regress the gradient of the probability density. Our framework enables high-quality, diverse generation and scales efficiently to higher-dimensional simplices. To enable training-free guidance, we propose Straight-Through Guided Flows (STGFlow), a classifier-based guidance method that leverages straight-through estimators to steer the unconditional velocity field toward optimal vertices of the simplex. STGFlow enables efficient inference-time guidance using classifiers pre-trained on clean sequences, and can be used with any discrete flow method. Together, these components form a robust framework for controllable de novo sequence generation. We demonstrate state-of-the-art performance in conditional DNA promoter design, sequence-only protein generation, and target-binding peptide design for rare disease treatment.
Related papers
- Learning Straight Flows by Learning Curved Interpolants [19.42604535211923]
Flow matching models typically use linear interpolants to define the forward/noise addition process.
This, together with the independent coupling between noise and target distributions, yields a vector field which is often non-straight.
We propose to learn flexible (potentially curved) interpolants in order to learn straight vector fields to enable faster generation.
arXiv Detail & Related papers (2025-03-26T16:54:56Z) - Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts [64.34482582690927]
We provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models.<n>We propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality.
arXiv Detail & Related papers (2025-03-04T17:46:51Z) - KL-geodesics flow matching with a novel sampling scheme [4.347494885647007]
Non-autoregressive language models generate all tokens simultaneously, offering potential speed advantages over traditional autoregressive models.<n>We investigate a conditional flow matching approach for text generation.
arXiv Detail & Related papers (2024-11-25T17:15:41Z) - Consistency Flow Matching: Defining Straight Flows with Velocity Consistency [97.28511135503176]
We introduce Consistency Flow Matching (Consistency-FM), a novel FM method that explicitly enforces self-consistency in the velocity field.
Preliminary experiments demonstrate that our Consistency-FM significantly improves training efficiency by converging 4.4x faster than consistency models.
arXiv Detail & Related papers (2024-07-02T16:15:37Z) - Dirichlet Flow Matching with Applications to DNA Sequence Design [37.12809686044779]
We develop Dirichlet flow matching on the simplex based on mixtures of Dirichlet distributions as probability paths.
We provide distilled Dirichlet flow matching, which enables one-step sequence generation with minimal performance hits.
arXiv Detail & Related papers (2024-02-08T17:18:01Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z) - Free-form Flows: Make Any Architecture a Normalizing Flow [8.163244519983298]
We develop a training procedure that uses an efficient estimator for the gradient of the change of variables formula.
This enables any dimension-preserving neural network to serve as a generative model through maximum likelihood training.
We achieve excellent results in molecule generation benchmarks utilizing $E(n)$-equivariant networks.
arXiv Detail & Related papers (2023-10-25T13:23:08Z) - Gaussianization Flows [113.79542218282282]
We propose a new type of normalizing flow model that enables both efficient iteration of likelihoods and efficient inversion for sample generation.
Because of this guaranteed expressivity, they can capture multimodal target distributions without compromising the efficiency of sample generation.
arXiv Detail & Related papers (2020-03-04T08:15:06Z) - Semi-Supervised Learning with Normalizing Flows [54.376602201489995]
FlowGMM is an end-to-end approach to generative semi supervised learning with normalizing flows.
We show promising results on a wide range of applications, including AG-News and Yahoo Answers text data.
arXiv Detail & Related papers (2019-12-30T17:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.