onepot CORE -- an enumerated chemical space to streamline drug discovery, enabled by automated small molecule synthesis and AI
- URL: http://arxiv.org/abs/2601.12603v1
- Date: Sun, 18 Jan 2026 22:23:20 GMT
- Title: onepot CORE -- an enumerated chemical space to streamline drug discovery, enabled by automated small molecule synthesis and AI
- Authors: Andrei S. Tyrin, Brandon Wang, Manuel Muñoz, Samuel H. Foxman, Daniil A. Boiko,
- Abstract summary: Onepot CORE is an enumerated chemical space containing 3.4B molecules and corresponding on-demand product synthesis.<n>Onepot CORE is constructed by (i) selecting a reaction set commonly used in medicinal chemistry, (ii) sourcing building blocks from supplier catalogs, (iii) enumerating candidate products, and (iv) applying ML-based feasibility assessment to prioritize compounds.
- Score: 0.49478969093606673
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The design-make-test-analyze cycle in early-stage drug discovery remains constrained primarily by the "make" step: small-molecule synthesis is slow, costly, and difficult to scale or automate across diverse chemotypes. Enumerated chemical spaces aim to reduce this bottleneck by predefining synthesizable regions of chemical space from available building blocks and reliable reactions, yet existing commercial spaces are still limited by long turnaround times, narrow reaction scope, and substantial manual decision-making in route selection and execution. Here we present the first version of onepot CORE, an enumerated chemical space containing 3.4B molecules and corresponding on-demand synthesis product enabled by an automated synthesis platform and an AI chemist, Phil, that designs, executes, and analyzes experiments. onepot CORE is constructed by (i) selecting a reaction set commonly used in medicinal chemistry, (ii) sourcing and curating building blocks from supplier catalogs, (iii) enumerating candidate products, and (iv) applying ML-based feasibility assessment to prioritize compounds for robust execution. In the current release, the space is supported by seven reactions. We describe an end-to-end workflow - from route selection and automated liquid handling through workup and purification. We further report validation across operational metrics (success rate, timelines, purity, and identity), including NMR confirmation for a representative set of synthesized compounds and assay suitability demonstrated using a series of DPP4 inhibitors. Collectively, onepot CORE illustrates a path toward faster, more reliable access to diverse small molecules, supporting accelerated discovery in pharmaceuticals and beyond.
Related papers
- ChemBART: A Pre-trained BART Model Assisting Organic Chemistry Analysis [9.010003142738338]
ChemBART is a SMILES-based large language model pre-trained on chemical reactions.<n>ChemBART effectively solves a variety of chemical problems, including precursor/reagent generation, temperature-yield regression, molecular property classification, and optimizing the policy and value functions.<n>Our work validates the power of reaction-focused pre-training and showcases the broad utility of ChemBART in advancing the complete synthesis planning cycle.
arXiv Detail & Related papers (2026-01-06T10:55:38Z) - ReACT-Drug: Reaction-Template Guided Reinforcement Learning for de novo Drug Design [0.34155322317700576]
We introduce bfReACT-Drug, a fully integrated, target-agnostic molecular design framework based on Reinforcement Learning.<n>This architecture highlights the potential of integrating structural biology, deep representation learning, and chemical rules to automate and accelerate rational drug design.
arXiv Detail & Related papers (2025-12-24T05:29:35Z) - ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z) - BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction.
Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions.
This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z) - PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes [33.293741487835824]
Multimodal Large Language Models (MLLMs) have seen growing adoption across various scientific disciplines.
Current approaches, however, often neglect the critical role of multiple molecule graph interaction in understanding chemical reactions.
This study introduces PRESTO, a new framework that bridges the molecule-text modality gap by integrating a comprehensive benchmark of pretraining strategies and dataset configurations.
arXiv Detail & Related papers (2024-06-19T03:59:46Z) - RGFN: Synthesizable Molecular Generation Using GFlowNets [51.33672611338754]
We propose Reaction-GFlowNet, an extension of the GFlowNet framework that operates directly in the space of chemical reactions.
RGFN allows out-of-the-box synthesizability while maintaining comparable quality of generated candidates.
We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.
arXiv Detail & Related papers (2024-06-01T13:11:11Z) - Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [55.30328162764292]
Chemist-X is a comprehensive AI agent that automates the reaction condition optimization (RCO) task in chemical synthesis.<n>The agent uses retrieval-augmented generation (RAG) technology and AI-controlled wet-lab experiment executions.<n>Results of our automatic wet-lab experiments, achieved by fully LLM-supervised end-to-end operation with no human in the lope, prove Chemist-X's ability in self-driving laboratories.
arXiv Detail & Related papers (2023-11-16T01:21:33Z) - Tailoring Molecules for Protein Pockets: a Transformer-based Generative
Solution for Structured-based Drug Design [133.1268990638971]
De novo drug design based on the structure of a target protein can provide novel drug candidates.
We present a generative solution named TamGent that can directly generate candidate drugs from scratch for a given target.
arXiv Detail & Related papers (2022-08-30T09:32:39Z) - Learning To Navigate The Synthetically Accessible Chemical Space Using
Reinforcement Learning [75.95376096628135]
We propose a novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design.
In this setup, the agent learns to navigate through the immense synthetically accessible chemical space.
We describe how the end-to-end training in this study represents an important paradigm in radically expanding the synthesizable chemical space.
arXiv Detail & Related papers (2020-04-26T21:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.