Mixed Continuous and Categorical Flow Matching for 3D De Novo Molecule Generation
- URL: http://arxiv.org/abs/2404.19739v1
- Date: Tue, 30 Apr 2024 17:37:21 GMT
- Title: Mixed Continuous and Categorical Flow Matching for 3D De Novo Molecule Generation
- Authors: Ian Dunn, David Ryan Koes,
- Abstract summary: Flow matching is a recently proposed generative modeling framework that generalizes diffusion models.
We extend the flow matching framework to categorical data by constructing flows that are constrained to exist on a continuous representation of categorical data known as the probability simplex.
We find that, in practice, a simpler approach that makes no accommodations for the categorical nature of the data yields equivalent or superior performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Deep generative models that produce novel molecular structures have the potential to facilitate chemical discovery. Diffusion models currently achieve state of the art performance for 3D molecule generation. In this work, we explore the use of flow matching, a recently proposed generative modeling framework that generalizes diffusion models, for the task of de novo molecule generation. Flow matching provides flexibility in model design; however, the framework is predicated on the assumption of continuously-valued data. 3D de novo molecule generation requires jointly sampling continuous and categorical variables such as atom position and atom type. We extend the flow matching framework to categorical data by constructing flows that are constrained to exist on a continuous representation of categorical data known as the probability simplex. We call this extension SimplexFlow. We explore the use of SimplexFlow for de novo molecule generation. However, we find that, in practice, a simpler approach that makes no accommodations for the categorical nature of the data yields equivalent or superior performance. As a result of these experiments, we present FlowMol, a flow matching model for 3D de novo generative model that achieves improved performance over prior flow matching methods, and we raise important questions about the design of prior distributions for achieving strong performance in flow matching models. Code and trained models for reproducing this work are available at https://github.com/dunni3/FlowMol
Related papers
- Exploring Discrete Flow Matching for 3D De Novo Molecule Generation [0.0]
Flow matching is a recently proposed generative modeling framework that has achieved impressive performance on a variety of tasks.
We present FlowMol-CTMC, an open-source model that achieves state of the art performance for 3D de novo design with fewer learnable parameters than existing methods.
arXiv Detail & Related papers (2024-11-25T18:27:39Z) - Conformation Generation using Transformer Flows [55.2480439325792]
We present ConfFlow, a flow-based model for conformation generation based on transformer networks.
ConfFlow directly samples in the coordinate space without enforcing any explicit physical constraints.
ConfFlow improve accuracy by up to $40%$ relative to state-of-the-art learning-based methods.
arXiv Detail & Related papers (2024-11-16T14:42:05Z) - A survey of probabilistic generative frameworks for molecular simulations [0.0]
Generative artificial intelligence is now a widely used tool in molecular science.
We introduce and explain several classes of generative models, broadly sorted into two categories: flow-based models and diffusion models.
We examine their accuracy, computational cost, and generation speed across datasets with tunable dimensionality, complexity, and modal asymmetry.
arXiv Detail & Related papers (2024-11-14T12:05:08Z) - Conditional Synthesis of 3D Molecules with Time Correction Sampler [58.0834973489875]
Time-Aware Conditional Synthesis (TACS) is a novel approach to conditional generation on diffusion models.
It integrates adaptively controlled plug-and-play "online" guidance into a diffusion model, driving samples toward the desired properties.
arXiv Detail & Related papers (2024-11-01T12:59:25Z) - Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences.
We aim to optimize downstream reward functions while preserving the naturalness of these design spaces.
Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z) - Efficient 3D Molecular Generation with Flow Matching and Scale Optimal Transport [43.56824843205882]
Semla is a scalable E(3)-equivariant message passing architecture.
SemlaFlow is trained using flow matching along with scale optimal transport.
Our model produces state-of-the-art results on benchmark datasets with just 100 sampling steps.
arXiv Detail & Related papers (2024-06-11T13:51:51Z) - Fisher Flow Matching for Generative Modeling over Discrete Data [12.69975914345141]
We introduce Fisher-Flow, a novel flow-matching model for discrete data.
Fisher-Flow takes a manifestly geometric perspective by considering categorical distributions over discrete data.
We prove that the gradient flow induced by Fisher-Flow is optimal in reducing the forward KL divergence.
arXiv Detail & Related papers (2024-05-23T15:02:11Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z) - SE(3)-Stochastic Flow Matching for Protein Backbone Generation [54.951832422425454]
We introduce FoldFlow, a series of novel generative models of increasing modeling power based on the flow-matching paradigm over $3mathrmD$ rigid motions.
Our family of FoldFlowgenerative models offers several advantages over previous approaches to the generative modeling of proteins.
arXiv Detail & Related papers (2023-10-03T19:24:24Z) - Score-Based Generative Models for Molecule Generation [0.8808021343665321]
We train a Transformer-based score function on representations of 1.5 million samples from the ZINC dataset.
We use the Moses benchmarking framework to evaluate the generated samples on a suite of metrics.
arXiv Detail & Related papers (2022-03-07T13:46:02Z) - Generative Flows with Invertible Attentions [135.23766216657745]
We introduce two types of invertible attention mechanisms for generative flow models.
We exploit split-based attention mechanisms to learn the attention weights and input representations on every two splits of flow feature maps.
Our method provides invertible attention modules with tractable Jacobian determinants, enabling seamless integration of it at any positions of the flow-based models.
arXiv Detail & Related papers (2021-06-07T20:43:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.