Synthelite: Chemist-aligned and feasibility-aware synthesis planning with LLMs
- URL: http://arxiv.org/abs/2512.16424v1
- Date: Thu, 18 Dec 2025 11:24:30 GMT
- Title: Synthelite: Chemist-aligned and feasibility-aware synthesis planning with LLMs
- Authors: Nguyen Xuan-Vu, Daniel Armstrong, Milena Wehrbach, Andres M Bran, Zlatko JonĨev, Philippe Schwaller,
- Abstract summary: We introduce Synthelite, a synthesis planning framework that uses large language models to propose retrosynthetic transformations.<n> Synthelite can generate end-to-end synthesis routes by harnessing the intrinsic chemical knowledge and reasoning capabilities of LLMs.<n>Our experiments demonstrate that Synthelite can flexibly adapt its planning trajectory to diverse user-specified constraints, achieving up to 95% success rates.
- Score: 3.7129661557601854
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computer-aided synthesis planning (CASP) has long been envisioned as a complementary tool for synthetic chemists. However, existing frameworks often lack mechanisms to allow interaction with human experts, limiting their ability to integrate chemists' insights. In this work, we introduce Synthelite, a synthesis planning framework that uses large language models (LLMs) to directly propose retrosynthetic transformations. Synthelite can generate end-to-end synthesis routes by harnessing the intrinsic chemical knowledge and reasoning capabilities of LLMs, while allowing expert intervention through natural language prompts. Our experiments demonstrate that Synthelite can flexibly adapt its planning trajectory to diverse user-specified constraints, achieving up to 95\% success rates in both strategy-constrained and starting-material-constrained synthesis tasks. Additionally, Synthelite exhibits the ability to account for chemical feasibility during route design. We envision Synthelite to be both a useful tool and a step toward a paradigm where LLMs are the central orchestrators of synthesis planning.
Related papers
- When Single Answer Is Not Enough: Rethinking Single-Step Retrosynthesis Benchmarks for LLMs [3.973137925060284]
We propose a new benchmarking framework for single-step retrosynthesis.<n>By emphasizing plausibility over exact match, this approach better aligns with human synthesis planning practices.<n>We also introduce CREED, a novel dataset comprising millions of ChemCensor-validated reaction records for LLM training.
arXiv Detail & Related papers (2026-02-03T14:03:32Z) - LeMat-Synth: a multi-modal toolbox to curate broad synthesis procedure databases from scientific literature [60.879220305044726]
We propose a multi-modal toolbox that employs large language models (LLMs) and vision language models (VLMs) to automatically extract and organize synthesis procedures and performance data.<n>We curated 81k open-access papers, yielding LeMat- Synth (v 1.0): a dataset containing synthesis procedures spanning 35 synthesis methods and 16 material classes.<n>We release a modular, open-source library designed to support community-driven extension to new corpora and synthesis domains.
arXiv Detail & Related papers (2025-10-28T17:58:18Z) - AOT*: Efficient Synthesis Planning via LLM-Empowered AND-OR Tree Search [22.026497456502806]
AOT* is a framework that transforms retrosynthetic planning by integrating LLM-generated chemical synthesis pathways with systematic AND-OR tree search.<n>AOT* exhibits competitive solve rates using 3-5$times$ fewer iterations than existing LLM-based approaches.
arXiv Detail & Related papers (2025-09-25T10:30:37Z) - Rethinking Molecule Synthesizability with Chain-of-Reaction [47.744071119775676]
We introduce ReaSyn, a generative framework for synthesizable projection.<n>We propose a novel perspective that views synthetic pathways akin to reasoning paths in large language models (LLMs)<n>With the CoR notation, ReaSyn can get dense supervision in every reaction step to explicitly learn chemical reaction rules.
arXiv Detail & Related papers (2025-09-19T15:29:57Z) - ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z) - LLM-Augmented Chemical Synthesis and Design Decision Programs [18.41721617026997]
We introduce an efficient scheme for encoding reaction pathways and present a new route-level search strategy.<n>We show that our LLM-augmented approach excels at retrosynthesis planning and extends naturally to the broader challenge of synthesizable molecular design.
arXiv Detail & Related papers (2025-05-11T15:43:00Z) - SynLlama: Generating Synthesizable Molecules and Their Analogs with Large Language Models [3.750173223006525]
We present a novel approach by fine-tuning Meta's Llama3 Large Language Models to create SynLlama.<n> SynLlama generates full synthetic pathways made of commonly accessible building blocks and robust organic reaction templates.<n>We find that SynLlama, even without training on external building blocks, can effectively generalize to unseen yet purchasable building blocks.
arXiv Detail & Related papers (2025-03-16T18:30:56Z) - BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction.
Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions.
This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z) - SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation [55.2480439325792]
We study the synthesis of six datasets, covering topic classification, sentiment analysis, tone detection, and humor.
We find that SynthesizRR greatly improves lexical and semantic diversity, similarity to human-written text, and distillation performance.
arXiv Detail & Related papers (2024-05-16T12:22:41Z) - FusionRetro: Molecule Representation Fusion via In-Context Learning for
Retrosynthetic Planning [58.47265392465442]
Retrosynthetic planning aims to devise a complete multi-step synthetic route from starting materials to a target molecule.
Current strategies use a decoupled approach of single-step retrosynthesis models and search algorithms.
We propose a novel framework that utilizes context information for improved retrosynthetic planning.
arXiv Detail & Related papers (2022-09-30T08:44:58Z) - ULSA: Unified Language of Synthesis Actions for Representation of
Synthesis Protocols [2.436060325115753]
We propose the first Unified Language of Synthesis Actions (ULSA) for describing synthesis procedures.
We created a dataset of 3,040 synthesis procedures annotated by domain experts according to the proposed ULSA scheme.
arXiv Detail & Related papers (2022-01-23T17:44:48Z) - Predictive Synthesis of Quantum Materials by Probabilistic Reinforcement
Learning [1.4680035572775534]
We use reinforcement learning to predict optimal synthesis schedules for a prototypical quantum material, semiconducting monolayer MoS$_2$.
The model can be extended to predict profiles for synthesis of complex structures including multi-phase heterostructures.
arXiv Detail & Related papers (2020-09-14T20:50:45Z) - Learning To Navigate The Synthetically Accessible Chemical Space Using
Reinforcement Learning [75.95376096628135]
We propose a novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design.
In this setup, the agent learns to navigate through the immense synthetically accessible chemical space.
We describe how the end-to-end training in this study represents an important paradigm in radically expanding the synthesizable chemical space.
arXiv Detail & Related papers (2020-04-26T21:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.