Multi-Objective-Guided Discrete Flow Matching for Controllable Biological Sequence Design
- URL: http://arxiv.org/abs/2505.07086v2
- Date: Wed, 14 May 2025 16:19:40 GMT
- Title: Multi-Objective-Guided Discrete Flow Matching for Controllable Biological Sequence Design
- Authors: Tong Chen, Yinuo Zhang, Sophia Tang, Pranam Chatterjee,
- Abstract summary: We present Multi-Objective-Guided Discrete Flow Matching (MOG-DFM), a general framework to steer any pretrained discrete flow matching generator.<n>At each sampling step, MOG-DFM computes a hybrid rank-directional score for candidate transitions.<n>We also trained two unconditional discrete flow matching models, PepDFM for diverse peptide generation and EnhancerDFM for functional enhancer DNA generation.
- Score: 5.64033072458324
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Designing biological sequences that satisfy multiple, often conflicting, functional and biophysical criteria remains a central challenge in biomolecule engineering. While discrete flow matching models have recently shown promise for efficient sampling in high-dimensional sequence spaces, existing approaches address only single objectives or require continuous embeddings that can distort discrete distributions. We present Multi-Objective-Guided Discrete Flow Matching (MOG-DFM), a general framework to steer any pretrained discrete flow matching generator toward Pareto-efficient trade-offs across multiple scalar objectives. At each sampling step, MOG-DFM computes a hybrid rank-directional score for candidate transitions and applies an adaptive hypercone filter to enforce consistent multi-objective progression. We also trained two unconditional discrete flow matching models, PepDFM for diverse peptide generation and EnhancerDFM for functional enhancer DNA generation, as base generation models for MOG-DFM. We demonstrate MOG-DFM's effectiveness in generating peptide binders optimized across five properties (hemolysis, non-fouling, solubility, half-life, and binding affinity), and in designing DNA sequences with specific enhancer classes and DNA shapes. In total, MOG-DFM proves to be a powerful tool for multi-property-guided biomolecule sequence design.
Related papers
- Rethinking Multimodality: Optimizing Multimodal Deep Learning for Biomedical Signal Classification [5.811275732167591]
This study proposes a novel perspective on multimodal deep learning for biomedical signal classification.<n>We systematically analyze how complementary feature domains impact model performance.<n>We demonstrate that optimal domain fusion isn't about the number of modalities, but the quality of their inherent complementarity.
arXiv Detail & Related papers (2025-08-01T14:12:10Z) - Evolutionary training-free guidance in diffusion model for 3D multi-objective molecular generation [13.140891054725962]
EGD is a training-free framework that embeds evolutionary operators directly into the diffusion sampling process.<n>On both single- and multi-target 3D conditional generation tasks EGD outperforms state-of-the-art conditional diffusion methods in accuracy and runs up to five times faster per generation.<n> EGD can embed arbitrary 3D fragments into the generated molecules while optimizing multiple conflicting properties in one unified process.
arXiv Detail & Related papers (2025-05-16T09:32:40Z) - TFG-Flow: Training-free Guidance in Multimodal Generative Flow [73.93071065307782]
We introduce TFG-Flow, a training-free guidance method for multimodal generative flow.<n>TFG-Flow addresses the curse-of-dimensionality while maintaining the property of unbiased sampling in guiding discrete variables.<n>We show that TFG-Flow has great potential in drug design by generating molecules with desired properties.
arXiv Detail & Related papers (2025-01-24T03:44:16Z) - GenMol: A Drug Discovery Generalist with Discrete Diffusion [43.29814519270451]
Generalist Molecular generative model (GenMol) is a versatile framework that uses only a single discrete diffusion model to handle diverse drug discovery scenarios.<n>GenMol generates Sequential Attachment-based Fragment Embedding sequences through non-autoregressive bidirectional parallel decoding.<n>GenMol significantly outperforms the previous GPT-based model in de novo generation and fragment-constrained generation.
arXiv Detail & Related papers (2025-01-10T18:30:05Z) - PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion [2.6668932659159905]
We present PepTune, a multi-objective discrete diffusion model for the simultaneous generation and optimization of therapeutic peptide SMILES.<n>We generate diverse, chemically-modified peptides optimized for multiple therapeutic properties, including target binding affinity, membrane permeability, solubility, hemolysis, and non-fouling characteristics.<n>In total, our results demonstrate that PepTune is a powerful and modular approach for multi-objective sequence design in discrete state spaces.
arXiv Detail & Related papers (2024-12-23T18:38:49Z) - Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design [56.957070405026194]
We propose an algorithm that enables direct backpropagation of rewards through entire trajectories generated by diffusion models.<n>DRAKES can generate sequences that are both natural-like and yield high rewards.
arXiv Detail & Related papers (2024-10-17T15:10:13Z) - Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen [76.02070962797794]
This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data.<n>CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - MoFormer: Multi-objective Antimicrobial Peptide Generation Based on Conditional Transformer Joint Multi-modal Fusion Descriptor [15.98003148948758]
We establish a multi-objective AMP synthesis pipeline (MoFormer) for the simultaneous optimization of multi-attributes of AMPs.
MoFormer improves the desired attributes of AMP sequences in a highly structured latent space, guided by conditional constraints and fine-grained multi-descriptor.
We show that MoFormer outperforms existing methods in the generation task of enhanced antimicrobial activity and minimal hemolysis.
arXiv Detail & Related papers (2024-06-03T07:17:18Z) - Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design [37.634098563033795]
We present a new flow-based model of discrete data that provides the missing link in enabling flow-based generative models.
Our key insight is that the discrete equivalent of continuous space flow matching can be realized using Continuous Time Markov Chains.
We apply this capability to the task of protein co-design, wherein we learn a model for jointly generating protein structure and sequence.
arXiv Detail & Related papers (2024-02-07T16:15:36Z) - BOtied: Multi-objective Bayesian optimization with tied multivariate ranks [33.414682601242006]
In this paper, we show a natural connection between non-dominated solutions and the extreme quantile of the joint cumulative distribution function.
Motivated by this link, we propose the Pareto-compliant CDF indicator and the associated acquisition function, BOtied.
Our experiments on a variety of synthetic and real-world problems demonstrate that BOtied outperforms state-of-the-art MOBO acquisition functions.
arXiv Detail & Related papers (2023-06-01T04:50:06Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z) - Diversifying Design of Nucleic Acid Aptamers Using Unsupervised Machine
Learning [54.247560894146105]
Inverse design of short single-stranded RNA and DNA sequences (aptamers) is the task of finding sequences that satisfy a set of desired criteria.
We propose to use an unsupervised machine learning model known as the Potts model to discover new, useful sequences with controllable sequence diversity.
arXiv Detail & Related papers (2022-08-10T13:30:58Z) - A Novel Unified Conditional Score-based Generative Framework for
Multi-modal Medical Image Completion [54.512440195060584]
We propose the Unified Multi-Modal Conditional Score-based Generative Model (UMM-CSGM) to take advantage of Score-based Generative Model (SGM)
UMM-CSGM employs a novel multi-in multi-out Conditional Score Network (mm-CSN) to learn a comprehensive set of cross-modal conditional distributions.
Experiments on BraTS19 dataset show that the UMM-CSGM can more reliably synthesize the heterogeneous enhancement and irregular area in tumor-induced lesions.
arXiv Detail & Related papers (2022-07-07T16:57:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.