Reimagining Target-Aware Molecular Generation through Retrieval-Enhanced Aligned Diffusion
- URL: http://arxiv.org/abs/2506.14488v1
- Date: Tue, 17 Jun 2025 13:09:11 GMT
- Title: Reimagining Target-Aware Molecular Generation through Retrieval-Enhanced Aligned Diffusion
- Authors: Dong Xu, Zhangfan Yang, Ka-chun Wong, Zexuan Zhu, Jiangqiang Li, Junkai Ji,
- Abstract summary: READ is introduced, which is the first to merge molecular Retrieval-Augmented Generation with an SE(3)-equivariant diffusion model.<n>It can achieve very competitive performance in CBGBench, surpassing state-of-the-art generative models and even native scaffolds.
- Score: 22.204642926984526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Breakthroughs in high-accuracy protein structure prediction, such as AlphaFold, have established receptor-based molecule design as a critical driver for rapid early-phase drug discovery. However, most approaches still struggle to balance pocket-specific geometric fit with strict valence and synthetic constraints. To resolve this trade-off, a Retrieval-Enhanced Aligned Diffusion termed READ is introduced, which is the first to merge molecular Retrieval-Augmented Generation with an SE(3)-equivariant diffusion model. Specifically, a contrastively pre-trained encoder aligns atom-level representations during training, then retrieves graph embeddings of pocket-matched scaffolds to guide each reverse-diffusion step at inference. This single mechanism can inject real-world chemical priors exactly where needed, producing valid, diverse, and shape-complementary ligands. Experimental results demonstrate that READ can achieve very competitive performance in CBGBench, surpassing state-of-the-art generative models and even native ligands. That suggests retrieval and diffusion can be co-optimized for faster, more reliable structure-based drug design.
Related papers
- Pretrained Joint Predictions for Scalable Batch Bayesian Optimization of Molecular Designs [1.3505106522886807]
We show how to obtain scalable probabilistic surrogates of binding affinity for use in Batch Bayesian Optimization.<n>This demands parallel acquisition functions that hedge between designs and the ability to rapidly sample from a joint predictive density to approximate them.<n>Key to this work is an investigation into the importance of prior networks in ENNs and how to pretrain them on synthetic data to improve downstream performance.
arXiv Detail & Related papers (2025-11-13T18:26:58Z) - Breaking the Modality Barrier: Generative Modeling for Accurate Molecule Retrieval from Mass Spectra [60.08608779794957]
We propose GLMR, a Generative Language Model-based Retrieval framework.<n>In the pre-retrieval stage, a contrastive learning-based model identifies top candidate molecules as contextual priors for the input mass spectrum.<n>In the generative retrieval stage, these candidate molecules are integrated with the input mass spectrum to guide a generative model in producing refined molecular structures.
arXiv Detail & Related papers (2025-11-09T07:25:53Z) - Diffusion Models at the Drug Discovery Frontier: A Review on Generating Small Molecules versus Therapeutic Peptides [6.436002724512122]
Diffusion models have emerged as a leading framework in generative modeling.<n>This review provides a systematic comparison of their application in designing two principal therapeutic modalities: small molecules and therapeutic peptides.<n>We conclude that the full potential of diffusion models will be unlocked by bridging these modality-specific gaps and integrating them into automated, closed-loop Design-Build-Test-Learn platforms.
arXiv Detail & Related papers (2025-10-31T19:11:41Z) - Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities [85.83359661628575]
We propose Progressive Inference-Time Annealing (PITA) to learn diffusion-based samplers.<n>PITA combines two complementary techniques: Annealing of the Boltzmann distribution and Diffusion smoothing.<n>It enables equilibrium sampling of N-body particle systems, Alanine Dipeptide, and tripeptides in Cartesian coordinates.
arXiv Detail & Related papers (2025-06-19T17:14:22Z) - Rao-Blackwell Gradient Estimators for Equivariant Denoising Diffusion [55.95767828747407]
In domains such as molecular and protein generation, physical systems exhibit inherent symmetries that are critical to model.<n>We present a framework that reduces training variance and provides a provably lower-variance gradient estimator.<n>We also present a practical implementation of this estimator incorporating the loss and sampling procedure through a method we call Orbit Diffusion.
arXiv Detail & Related papers (2025-02-14T03:26:57Z) - Chimera: Accurate retrosynthesis prediction by ensembling models with diverse inductive biases [3.885174353072695]
Planning and conducting chemical syntheses remains a major bottleneck in the discovery of functional small molecules.<n>Inspired by how chemists use different strategies to ideate reactions, we propose Chimera: a framework for building highly accurate reaction models.
arXiv Detail & Related papers (2024-12-06T18:55:19Z) - Conditional Synthesis of 3D Molecules with Time Correction Sampler [58.0834973489875]
Time-Aware Conditional Synthesis (TACS) is a novel approach to conditional generation on diffusion models.
It integrates adaptively controlled plug-and-play "online" guidance into a diffusion model, driving samples toward the desired properties.
arXiv Detail & Related papers (2024-11-01T12:59:25Z) - Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization [147.7899503829411]
AliDiff is a novel framework to align pretrained target diffusion models with preferred functional properties.
It can generate molecules with state-of-the-art binding energies with up to -7.07 Avg. Vina Score.
arXiv Detail & Related papers (2024-07-01T06:10:29Z) - DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design [62.68420322996345]
Existing structured-based drug design methods treat all ligand atoms equally.
We propose a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold.
Our approach achieves state-of-the-art performance in generating high-affinity molecules.
arXiv Detail & Related papers (2024-02-26T05:21:21Z) - AbDiffuser: Full-Atom Generation of in vitro Functioning Antibodies [44.149969082612486]
AbDiffuser is an equivariant and physics-informed diffusion model for antibody 3D structures and sequences.
Our approach improves protein diffusion by taking advantage of domain knowledge and physics-based constraints.
Numerical experiments showcase the ability of AbDiffuser to generate antibodies that closely track the sequence and structural properties of a reference set.
arXiv Detail & Related papers (2023-07-28T11:57:44Z) - LIMO: Latent Inceptionism for Targeted Molecule Generation [14.391216237573369]
We present Latent Inceptionism on Molecules (LIMO), which significantly accelerates molecule generation with an inceptionism-like technique.
Comprehensive experiments show that LIMO performs competitively on benchmark tasks.
One of our generated drug-like compounds has a predicted $K_D$ of $6 cdot 10-14$ M against the human estrogen receptor.
arXiv Detail & Related papers (2022-06-17T21:05:58Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Torsional Diffusion for Molecular Conformer Generation [28.225704750892795]
torsional diffusion is a novel diffusion framework that operates on the space of torsion angles.
On a standard benchmark of drug-like molecules, torsional diffusion generates superior conformer ensembles.
Our model provides exact likelihoods, which we employ to build the first generalizable Boltzmann generator.
arXiv Detail & Related papers (2022-06-01T04:30:41Z) - GeoDiff: a Geometric Diffusion Model for Molecular Conformation
Generation [102.85440102147267]
We propose a novel generative model named GeoDiff for molecular conformation prediction.
We show that GeoDiff is superior or comparable to existing state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-06T09:47:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.