Related papers: An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries

An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries

URL: http://arxiv.org/abs/2211.04468v1
Date: Wed, 19 Oct 2022 15:43:13 GMT
Title: An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries
Authors: Aryan Pedawi, Pawel Gniewek, Chaoyi Chang, Brandon M. Anderson, Henry van den Bedem
Abstract summary: Virtual, make-on-demand chemical libraries have transformed early-stage drug discovery by unlocking vast, synthetically accessible regions of chemical space. Recent years have witnessed rapid growth in these libraries from millions to trillions of compounds, hiding undiscovered, potent hits for a variety of therapeutic targets. We propose the Combinatorial Synthesis Library Variational Auto-Encoder (CSLVAE) to overcome these challenges.
Score: 1.5495593104596397
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Virtual, make-on-demand chemical libraries have transformed early-stage drug discovery by unlocking vast, synthetically accessible regions of chemical space. Recent years have witnessed rapid growth in these libraries from millions to trillions of compounds, hiding undiscovered, potent hits for a variety of therapeutic targets. However, they are quickly approaching a size beyond that which permits explicit enumeration, presenting new challenges for virtual screening. To overcome these challenges, we propose the Combinatorial Synthesis Library Variational Auto-Encoder (CSLVAE). The proposed generative model represents such libraries as a differentiable, hierarchically-organized database. Given a compound from the library, the molecular encoder constructs a query for retrieval, which is utilized by the molecular decoder to reconstruct the compound by first decoding its chemical reaction and subsequently decoding its reactants. Our design minimizes autoregression in the decoder, facilitating the generation of large, valid molecular graphs. Our method performs fast and parallel batch inference for ultra-large synthesis libraries, enabling a number of important applications in early-stage drug discovery. Compounds proposed by our method are guaranteed to be in the library, and thus synthetically and cost-effectively accessible. Importantly, CSLVAE can encode out-of-library compounds and search for in-library analogues. In experiments, we demonstrate the capabilities of the proposed method in the navigation of massive combinatorial synthesis libraries.

Related papers

ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z)
EpiCoder: Encompassing Diversity and Complexity in Code Generation [49.170195362149386]
We introduce a novel feature tree-based synthesis framework inspired by Abstract Syntax Trees (AST) Unlike AST, which captures syntactic structure of code, our framework models semantic relationships between code elements. We fine-tuned widely-used base models to create the EpiCoder series, achieving state-of-the-art performance at both the function and file levels.
arXiv Detail & Related papers (2025-01-08T18:58:15Z)
Automated Materials Discovery Platform Realized: Scanning Probe Microscopy of Combinatorial Libraries [14.028387934700222]
Combinatorial libraries are a powerful approach for exploring the evolution of physical properties across binary and ternary cross-sections. Scanning Probe Microscopies (SPM) offer significant potential for quantitative, functionally relevant combi-library readouts.
arXiv Detail & Related papers (2024-12-24T00:39:51Z)
Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries [51.72836644350993]
Multimodal Pretraining DEL-Fusion model (MPDF) We develop pretraining tasks applying contrastive objectives between different compound representations and their text descriptions. We propose a novel DEL-fusion framework that amalgamates compound information at the atomic, submolecular, and molecular levels.
arXiv Detail & Related papers (2024-09-07T17:32:21Z)
Bioptic -- A Target-Agnostic Potency-Based Small Molecules Search Engine [0.0]
We develop a target-agnostic, efficacy-based molecule search model. We screen the ultra-large 40B Enamine REAL library with 100% recall rate. We benchmarked our model and several state-of-the-art models for both speed performance and retrieval quality of novel molecules.
arXiv Detail & Related papers (2024-06-13T17:53:29Z)
RGFN: Synthesizable Molecular Generation Using GFlowNets [51.33672611338754]
We propose Reaction-GFlowNet, an extension of the GFlowNet framework that operates directly in the space of chemical reactions. RGFN allows out-of-the-box synthesizability while maintaining comparable quality of generated candidates. We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.
arXiv Detail & Related papers (2024-06-01T13:11:11Z)
Towards DNA-Encoded Library Generation with GFlowNets [35.09890349911668]
One of the key challenges in using DELs is library design. In this paper we consider the task of protein-protein interaction (PPI) biased DEL. We evaluate several machine learning algorithms on the modulation task and use them as a reward for the proposed GFlowNet-based generative approach.
arXiv Detail & Related papers (2024-04-15T19:01:20Z)
LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch. We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z)
Compositional Deep Probabilistic Models of DNA Encoded Libraries [6.206196935093064]
We introduce a compositional deep probabilistic model of DEL data, DEL-Compose, which decomposes molecular representations into their mono-synthon, di-synthon, and tri-synthon building blocks. Our model demonstrates strong performance compared to count baselines, enriches the correct pharmacophores, and offers valuable insights via its intrinsic interpretable structure.
arXiv Detail & Related papers (2023-10-20T19:04:28Z)
ChemVise: Maximizing Out-of-Distribution Chemical Detection with the Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones. We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z)
CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks [62.22920673080208]
Single-step generative model can dramatically simplify the search process and be optimized in end-to-end manner. We name the pre-trained generative retrieval model as CorpusBrain as all information about the corpus is encoded in its parameters without the need of constructing additional index.
arXiv Detail & Related papers (2022-08-16T10:22:49Z)
Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning [75.95376096628135]
We propose a novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design. In this setup, the agent learns to navigate through the immense synthetically accessible chemical space. We describe how the end-to-end training in this study represents an important paradigm in radically expanding the synthesizable chemical space.
arXiv Detail & Related papers (2020-04-26T21:40:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.