Dirichlet Diffusion Score Model for Biological Sequence Generation
- URL: http://arxiv.org/abs/2305.10699v2
- Date: Fri, 16 Jun 2023 04:52:45 GMT
- Title: Dirichlet Diffusion Score Model for Biological Sequence Generation
- Authors: Pavel Avdeyev, Chenlai Shi, Yuhao Tan, Kseniia Dudnyk, Jian Zhou
- Abstract summary: Diffusion generative models have achieved considerable success in many applications.
We introduce a diffusion process defined in the probability simplex space with stationary distribution being the Dirichlet distribution.
This makes diffusion in continuous space natural for modeling discrete data.
- Score: 2.0910267321492926
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Designing biological sequences is an important challenge that requires
satisfying complex constraints and thus is a natural problem to address with
deep generative modeling. Diffusion generative models have achieved
considerable success in many applications. Score-based generative stochastic
differential equations (SDE) model is a continuous-time diffusion model
framework that enjoys many benefits, but the originally proposed SDEs are not
naturally designed for modeling discrete data. To develop generative SDE models
for discrete data such as biological sequences, here we introduce a diffusion
process defined in the probability simplex space with stationary distribution
being the Dirichlet distribution. This makes diffusion in continuous space
natural for modeling discrete data. We refer to this approach as Dirchlet
diffusion score model. We demonstrate that this technique can generate samples
that satisfy hard constraints using a Sudoku generation task. This generative
model can also solve Sudoku, including hard puzzles, without additional
training. Finally, we applied this approach to develop the first human promoter
DNA sequence design model and showed that designed sequences share similar
properties with natural promoter sequences.
Related papers
- Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design [56.957070405026194]
We propose an algorithm that enables direct backpropagation of rewards through entire trajectories generated by diffusion models.
DRAKES can generate sequences that are both natural-like and yield high rewards.
arXiv Detail & Related papers (2024-10-17T15:10:13Z) - Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences.
We aim to optimize downstream reward functions while preserving the naturalness of these design spaces.
Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z) - Convergence Analysis of Discrete Diffusion Model: Exact Implementation
through Uniformization [17.535229185525353]
We introduce an algorithm leveraging the uniformization of continuous Markov chains, implementing transitions on random time points.
Our results align with state-of-the-art achievements for diffusion models in $mathbbRd$ and further underscore the advantages of discrete diffusion models in comparison to the $mathbbRd$ setting.
arXiv Detail & Related papers (2024-02-12T22:26:52Z) - Latent Diffusion Model for DNA Sequence Generation [5.194506374366898]
We propose a novel latent diffusion model, DiscDiff, tailored for discrete DNA sequence generation.
By simply embedding discrete DNA sequences into a continuous latent space using an autoencoder, we are able to leverage the powerful generative abilities of continuous diffusion models for the generation of discrete data.
We contribute a comprehensive cross-species dataset of 150K unique promoter-gene sequences from 15 species, enriching resources for future generative modelling in genomics.
arXiv Detail & Related papers (2023-10-09T20:58:52Z) - Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling.
We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z) - Exploring the Optimal Choice for Generative Processes in Diffusion
Models: Ordinary vs Stochastic Differential Equations [6.2284442126065525]
We study the problem mathematically for two limiting scenarios: the zero diffusion (ODE) case and the large diffusion case.
Our findings indicate that when the perturbation occurs at the end of the generative process, the ODE model outperforms the SDE model with a large diffusion coefficient.
arXiv Detail & Related papers (2023-06-03T09:27:15Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z) - A Survey on Generative Diffusion Model [75.93774014861978]
Diffusion models are an emerging class of deep generative models.
They have certain limitations, including a time-consuming iterative generation process and confinement to high-dimensional Euclidean space.
This survey presents a plethora of advanced techniques aimed at enhancing diffusion models.
arXiv Detail & Related papers (2022-09-06T16:56:21Z) - Score-based Generative Modeling of Graphs via the System of Stochastic
Differential Equations [57.15855198512551]
We propose a novel score-based generative model for graphs with a continuous-time framework.
We show that our method is able to generate molecules that lie close to the training distribution yet do not violate the chemical valency rule.
arXiv Detail & Related papers (2022-02-05T08:21:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.