Pharmacophore-based design by learning on voxel grids
- URL: http://arxiv.org/abs/2512.02031v1
- Date: Wed, 19 Nov 2025 17:10:04 GMT
- Title: Pharmacophore-based design by learning on voxel grids
- Authors: Omar Mahmood, Pedro O. Pinheiro, Richard Bonneau, Saeed Saremi, Vishnu Sresht,
- Abstract summary: Ligand-based drug discovery relies on making use of known binders to a protein target to find structurally diverse molecules likely to bind.<n>One popular approach overlays the pharmacophore-shape profile of the known binder to 3D conformations enumerated for each of the library molecules, computes overlaps, and picks a set of diverse library molecules with high overlaps.<n>We propose a pharmacophore-based generative model and library-based virtual screening that address the scaling and fecundity issues of conventional pharmacophore-based screening.
- Score: 9.55984245236999
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ligand-based drug discovery (LBDD) relies on making use of known binders to a protein target to find structurally diverse molecules similarly likely to bind. This process typically involves a brute force search of the known binder (query) against a molecular library using some metric of molecular similarity. One popular approach overlays the pharmacophore-shape profile of the known binder to 3D conformations enumerated for each of the library molecules, computes overlaps, and picks a set of diverse library molecules with high overlaps. While this virtual screening workflow has had considerable success in hit diversification, scaffold hopping, and patent busting, it scales poorly with library sizes and restricts candidate generation to existing library compounds. Leveraging recent advances in voxel-based generative modelling, we propose a pharmacophore-based generative model and workflows that address the scaling and fecundity issues of conventional pharmacophore-based virtual screening. We introduce \emph{VoxCap}, a voxel captioning method for generating SMILES strings from voxelised molecular representations. We propose two workflows as practical use cases as well as benchmarks for pharmacophore-based generation: \emph{de-novo} design, in which we aim to generate new molecules with high pharmacophore-shape similarities to query molecules, and fast search, which aims to combine generative design with a cheap 2D substructure similarity search for efficient hit identification. Our results show that VoxCap significantly outperforms previous methods in generating diverse \textit{de-novo} hits. When combined with our fast search workflow, VoxCap reduces computational time by orders of magnitude while returning hits for all query molecules, enabling the search of large libraries that are intractable to search by brute force.
Related papers
- From In Silico to In Vitro: Evaluating Molecule Generative Models for Hit Generation [1.7744342894757368]
We investigate whether generative models can replace one step of the pipeline: hit-like molecule generation.<n>Specifically, we investigate if such models can be trained to generate hit-like molecules, enabling direct incorporation into, or even substitution of, traditional hit identification.<n>Our results show that these models can generate valid, diverse, and biologically relevant compounds across multiple targets.
arXiv Detail & Related papers (2025-12-26T14:02:59Z) - Bioptic B1: A Target-Agnostic Potency-Based Small Molecules Search Engine [0.0]
We develop a target-agnostic, efficacy-based molecule search model.<n>We screen the ultra-large 40B Enamine REAL library with 100% recall rate.<n>We benchmarked our model and several state-of-the-art models for both speed performance and retrieval quality of novel molecules.
arXiv Detail & Related papers (2024-06-13T17:53:29Z) - RGFN: Synthesizable Molecular Generation Using GFlowNets [51.33672611338754]
We propose Reaction-GFlowNet, an extension of the GFlowNet framework that operates directly in the space of chemical reactions.
RGFN allows out-of-the-box synthesizability while maintaining comparable quality of generated candidates.
We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.
arXiv Detail & Related papers (2024-06-01T13:11:11Z) - DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design [62.68420322996345]
Existing structured-based drug design methods treat all ligand atoms equally.
We propose a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold.
Our approach achieves state-of-the-art performance in generating high-affinity molecules.
arXiv Detail & Related papers (2024-02-26T05:21:21Z) - PharmacoNet: Accelerating Large-Scale Virtual Screening by Deep
Pharmacophore Modeling [0.0]
We describe for the first time a deep-learning framework for structure-based pharmacophore modeling to address this challenge.
PharmacoNet is significantly faster than state-of-the-art structure-based approaches, yet reasonably accurate with a simple scoring function.
arXiv Detail & Related papers (2023-10-01T14:13:09Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - An efficient graph generative model for navigating ultra-large
combinatorial synthesis libraries [1.5495593104596397]
Virtual, make-on-demand chemical libraries have transformed early-stage drug discovery by unlocking vast, synthetically accessible regions of chemical space.
Recent years have witnessed rapid growth in these libraries from millions to trillions of compounds, hiding undiscovered, potent hits for a variety of therapeutic targets.
We propose the Combinatorial Synthesis Library Variational Auto-Encoder (CSLVAE) to overcome these challenges.
arXiv Detail & Related papers (2022-10-19T15:43:13Z) - Tailoring Molecules for Protein Pockets: a Transformer-based Generative
Solution for Structured-based Drug Design [133.1268990638971]
De novo drug design based on the structure of a target protein can provide novel drug candidates.
We present a generative solution named TamGent that can directly generate candidate drugs from scratch for a given target.
arXiv Detail & Related papers (2022-08-30T09:32:39Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z) - ChemoVerse: Manifold traversal of latent spaces for novel molecule
discovery [0.7742297876120561]
It is essential to identify molecular structures with the desired chemical properties.
Recent advances in generative models using neural networks and machine learning are being widely used to design virtual libraries of drug-like compounds.
arXiv Detail & Related papers (2020-09-29T12:11:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.