Multi-domain Distribution Learning for De Novo Drug Design
- URL: http://arxiv.org/abs/2508.17815v1
- Date: Mon, 25 Aug 2025 09:12:01 GMT
- Title: Multi-domain Distribution Learning for De Novo Drug Design
- Authors: Arne Schneuing, Ilia Igashov, Adrian W. Dobbelstein, Thomas Castiglione, Michael Bronstein, Bruno Correia,
- Abstract summary: DrugFlow is a generative model for structure-based drug design that integrates continuous flow matching with discrete Markov bridges.<n>We endow DrugFlow with an uncertainty estimate that is able to detect out-of-distribution samples.<n>We extend our model to also explore the conformational landscape of the protein by jointly sampling side chain angles and molecules.
- Score: 5.947157646283629
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We introduce DrugFlow, a generative model for structure-based drug design that integrates continuous flow matching with discrete Markov bridges, demonstrating state-of-the-art performance in learning chemical, geometric, and physical aspects of three-dimensional protein-ligand data. We endow DrugFlow with an uncertainty estimate that is able to detect out-of-distribution samples. To further enhance the sampling process towards distribution regions with desirable metric values, we propose a joint preference alignment scheme applicable to both flow matching and Markov bridge frameworks. Furthermore, we extend our model to also explore the conformational landscape of the protein by jointly sampling side chain angles and molecules.
Related papers
- GeodesicNVS: Probability Density Geodesic Flow Matching for Novel View Synthesis [54.39598154430305]
We propose a Data-to-Data Flow Matching framework that learns deterministic transformations directly between paired views.<n>PDG-FM constrains flow trajectories using geodesic interpolants derived from probability density metrics of pretrained diffusion models.<n>These results highlight the advantages of incorporating data-dependent geometric regularization into deterministic flow matching for consistent novel view generation.
arXiv Detail & Related papers (2026-03-01T09:30:11Z) - EvoEGF-Mol: Evolving Exponential Geodesic Flow for Structure-based Drug Design [5.680996830009093]
We propose an information-geometric approach to structure-based drug design.<n>EvoEGF-Mol replaces static Dirac targets with dynamically concentrating distributions.<n>Our model approaches a reference-level PoseBusters passing rate (93.4%) on CrossDock, demonstrating remarkable precision and interaction fidelity.
arXiv Detail & Related papers (2026-01-30T02:26:13Z) - Surface-based Molecular Design with Multi-modal Flow Matching [64.00572241268597]
SurfFlow is a novel surface-based generative algorithm that enables comprehensive co-design of sequence, structure, and surface for peptides.<n> evaluated on the comprehensive PepMerge benchmark, SurfFlow consistently outperforms full-atom baselines across all metrics.
arXiv Detail & Related papers (2026-01-08T02:19:29Z) - FlexiFlow: decomposable flow matching for generation of flexible molecular ensemble [43.16267651837197]
FlexiFlow is a novel architecture that extends flow-matching models.<n>We demonstrate the effectiveness of our approach on the QM9 and GEOM Drugs datasets.
arXiv Detail & Related papers (2025-11-21T13:50:54Z) - Exploring Discrete Flow Matching for 3D De Novo Molecule Generation [0.0]
Flow matching is a recently proposed generative modeling framework that has achieved impressive performance on a variety of tasks.
We present FlowMol-CTMC, an open-source model that achieves state of the art performance for 3D de novo design with fewer learnable parameters than existing methods.
arXiv Detail & Related papers (2024-11-25T18:27:39Z) - Geometric-informed GFlowNets for Structure-Based Drug Design [4.8722087770556906]
We employ Generative Flow Networks (GFlowNets) to explore the vast space of drug-like molecules.
We introduce a novel modification to the GFlowNet framework by incorporating trigonometrically consistent embeddings.
Experiments conducted using CrossDocked 2020 demonstrated an improvement in the binding affinity between generated molecules and protein pockets.
arXiv Detail & Related papers (2024-06-16T09:32:19Z) - SemlaFlow -- Efficient 3D Molecular Generation with Latent Attention and Equivariant Flow Matching [43.56824843205882]
Semla is a scalable E(3)-equivariant message passing architecture.<n>SemlaFlow is trained to generate a joint distribution over atom types, coordinates, bond types and formal charges.<n>Our model produces state-of-the-art results on benchmark datasets with as few as 20 sampling steps.
arXiv Detail & Related papers (2024-06-11T13:51:51Z) - Full-Atom Peptide Design based on Multi-modal Flow Matching [32.58558711545861]
We present PepFlow, the first multi-modal deep generative model grounded in the flow-matching framework for the design of full-atom peptides.
We characterize the peptide structure using rigid backbone frames within the $mathrmSE(3)$ manifold and side-chain angles on high-dimensional tori.
Our approach adeptly tackles various tasks such as fix-backbone sequence design and side-chain packing through partial sampling.
arXiv Detail & Related papers (2024-06-02T12:59:54Z) - Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation [55.93511121486321]
We introduce FoldFlow-2, a novel sequence-conditioned flow matching model for protein structure generation.<n>We train FoldFlow-2 at scale on a new dataset that is an order of magnitude larger than PDB datasets of prior works.<n>We empirically observe that FoldFlow-2 outperforms previous state-of-the-art protein structure-based generative models.
arXiv Detail & Related papers (2024-05-30T17:53:50Z) - Mixed Continuous and Categorical Flow Matching for 3D De Novo Molecule Generation [0.0]
Flow matching is a recently proposed generative modeling framework that generalizes diffusion models.
We extend the flow matching framework to categorical data by constructing flows that are constrained to exist on a continuous representation of categorical data known as the probability simplex.
We find that, in practice, a simpler approach that makes no accommodations for the categorical nature of the data yields equivalent or superior performance.
arXiv Detail & Related papers (2024-04-30T17:37:21Z) - DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization [49.85944390503957]
DecompOpt is a structure-based molecular optimization method based on a controllable and diffusion model.
We show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines.
arXiv Detail & Related papers (2024-03-07T02:53:40Z) - SE(3)-Stochastic Flow Matching for Protein Backbone Generation [54.951832422425454]
We introduce FoldFlow, a series of novel generative models of increasing modeling power based on the flow-matching paradigm over $3mathrmD$ rigid motions.
Our family of FoldFlowgenerative models offers several advantages over previous approaches to the generative modeling of proteins.
arXiv Detail & Related papers (2023-10-03T19:24:24Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.