Generative molecule evolution using 3D pharmacophore for efficient Structure-Based Drug Design
- URL: http://arxiv.org/abs/2507.20130v1
- Date: Sun, 27 Jul 2025 04:58:11 GMT
- Title: Generative molecule evolution using 3D pharmacophore for efficient Structure-Based Drug Design
- Authors: Yi He, Ailun Wang, Zhi Wang, Yu Liu, Xingyuan Xu, Wen Yan,
- Abstract summary: We propose an evolutionary framework named MEVO, which bridges the gap between billion-scale small molecule dataset and the scarce protein-ligand complex dataset.<n>MEVO is composed of three key components: a high-fidelity VQ-VAE for molecule representation in latent space, a diffusion model for pharmacophore-guided molecule generation, and a pocket-aware evolutionary strategy for molecule optimization.
- Score: 8.652951659846838
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent advances in generative models, particularly diffusion and auto-regressive models, have revolutionized fields like computer vision and natural language processing. However, their application to structure-based drug design (SBDD) remains limited due to critical data constraints. To address the limitation of training data for models targeting SBDD tasks, we propose an evolutionary framework named MEVO, which bridges the gap between billion-scale small molecule dataset and the scarce protein-ligand complex dataset, and effectively increase the abundance of training data for generative SBDD models. MEVO is composed of three key components: a high-fidelity VQ-VAE for molecule representation in latent space, a diffusion model for pharmacophore-guided molecule generation, and a pocket-aware evolutionary strategy for molecule optimization with physics-based scoring function. This framework efficiently generate high-affinity binders for various protein targets, validated with predicted binding affinities using free energy perturbation (FEP) methods. In addition, we showcase the capability of MEVO in designing potent inhibitors to KRAS$^{\textrm{G12D}}$, a challenging target in cancer therapeutics, with similar affinity to the known highly active inhibitor evaluated by FEP calculations. With high versatility and generalizability, MEVO offers an effective and data-efficient model for various tasks in structure-based ligand design.
Related papers
- Evolutionary training-free guidance in diffusion model for 3D multi-objective molecular generation [13.140891054725962]
EGD is a training-free framework that embeds evolutionary operators directly into the diffusion sampling process.<n>On both single- and multi-target 3D conditional generation tasks EGD outperforms state-of-the-art conditional diffusion methods in accuracy and runs up to five times faster per generation.<n> EGD can embed arbitrary 3D fragments into the generated molecules while optimizing multiple conflicting properties in one unified process.
arXiv Detail & Related papers (2025-05-16T09:32:40Z) - Enhanced Sampling, Public Dataset and Generative Model for Drug-Protein Dissociation Dynamics [10.80659641278556]
Drug-protein binding and dissociation dynamics are fundamental to understanding molecular interactions in biological systems.<n>We propose a novel research paradigm that combines molecular dynamics simulations, enhanced sampling, and AI generative models to address this issue.<n>Our ongoing efforts focus on expanding this methodology to encompass a broader spectrum of drug-protein complexes and exploring novel applications in pathway prediction.
arXiv Detail & Related papers (2025-04-25T14:10:06Z) - UniGenX: Unified Generation of Sequence and Structure with Autoregressive Diffusion [61.690978792873196]
Existing approaches rely on either autoregressive sequence models or diffusion models.<n>We propose UniGenX, a unified framework that combines autoregressive next-token prediction with conditional diffusion models.<n>We validate the effectiveness of UniGenX on material and small molecule generation tasks.
arXiv Detail & Related papers (2025-03-09T16:43:07Z) - Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Dumpling GNN: Hybrid GNN Enables Better ADC Payload Activity Prediction Based on Chemical Structure [53.76752789814785]
DumplingGNN is a hybrid Graph Neural Network architecture specifically designed for predicting ADC payload activity based on chemical structure.
We evaluate it on a comprehensive ADC payload dataset focusing on DNA Topoisomerase I inhibitors.
It demonstrates exceptional accuracy (91.48%), sensitivity (95.08%), and specificity (97.54%) on our specialized ADC payload dataset.
arXiv Detail & Related papers (2024-09-23T17:11:04Z) - Decomposed Direct Preference Optimization for Structure-Based Drug Design [47.561983733291804]
We propose DecompDPO, a structure-based optimization method to align diffusion models with pharmaceutical needs.
DecompDPO can be effectively used for two main purposes: fine-tuning pretrained diffusion models for molecule generation across various protein families, and molecular optimization given a specific protein subpocket after generation.
arXiv Detail & Related papers (2024-07-19T02:12:25Z) - Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization [147.7899503829411]
AliDiff is a novel framework to align pretrained target diffusion models with preferred functional properties.
It can generate molecules with state-of-the-art binding energies with up to -7.07 Avg. Vina Score.
arXiv Detail & Related papers (2024-07-01T06:10:29Z) - PILOT: Equivariant diffusion for pocket conditioned de novo ligand generation with multi-objective guidance via importance sampling [8.619610909783441]
We propose an in-silico approach for the $textitde novo$ generation of 3D ligand structures using the equivariant diffusion model PILOT.
Its multi-objective-based importance sampling strategy is designed to direct the model towards molecules that exhibit desired characteristics.
We employ PILOT to generate novel metrics for unseen protein pockets from the Kinodata-3D dataset.
arXiv Detail & Related papers (2024-05-23T17:58:28Z) - DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization [49.85944390503957]
DecompOpt is a structure-based molecular optimization method based on a controllable and diffusion model.
We show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines.
arXiv Detail & Related papers (2024-03-07T02:53:40Z) - Navigating the Design Space of Equivariant Diffusion-Based Generative
Models for De Novo 3D Molecule Generation [1.3124513975412255]
Deep generative diffusion models are a promising avenue for 3D de novo molecular design in materials science and drug discovery.
We explore the design space of E(3)-equivariant diffusion models, focusing on previously unexplored areas.
We present the EQGAT-diff model, which consistently outperforms established models for the QM9 and GEOM-Drugs datasets.
arXiv Detail & Related papers (2023-09-29T14:53:05Z) - Structure-based Drug Design with Equivariant Diffusion Models [40.73626627266543]
We present DiffSBDD, an SE(3)-equivariant diffusion model that generates novel conditioned on protein pockets.
Our in silico experiments demonstrate that DiffSBDD captures the statistics of the ground truth data effectively.
These results support the assumption that diffusion models represent the complex distribution of structural data more accurately than previous methods.
arXiv Detail & Related papers (2022-10-24T15:51:21Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - De novo design of protein target specific scaffold-based Inhibitors via
Reinforcement Learning [8.210294479991118]
Current approaches to develop molecules for a target protein are intuition-driven, hampered by slow iterative design-test cycles.
We propose a novel framework, called 3D-MolGNN$_RL$, coupling reinforcement learning to a deep generative model based on 3D-Scaffold.
Our approach can serve as an interpretable artificial intelligence (AI) tool for lead optimization with optimized activity, potency, and biophysical properties.
arXiv Detail & Related papers (2022-05-21T00:47:35Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.