A Scientific Reasoning Model for Organic Synthesis Procedure Generation
- URL: http://arxiv.org/abs/2512.13668v1
- Date: Mon, 15 Dec 2025 18:55:39 GMT
- Title: A Scientific Reasoning Model for Organic Synthesis Procedure Generation
- Authors: Guoqing Liu, Junren Li, Zihan Zhao, Eray Inanc, Krzysztof Maziarz, Jose Garrido Torres, Victor Garcia Satorras, Shoko Ueda, Christopher M. Bishop, Marwin Segler,
- Abstract summary: We present QFANG, a scientific reasoning language model capable of generating precise, structured experimental procedures.<n>We introduce a Chemistry-Guided Reasoning (CGR) framework that produces chain-of-thought data grounded in chemical knowledge at scale.<n>We apply Reinforcement Learning from Verifiable Rewards (RLVR) to further enhance procedural accuracy.
- Score: 12.609346156252393
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Solving computer-aided synthesis planning is essential for enabling fully automated, robot-assisted synthesis workflows and improving the efficiency of drug discovery. A key challenge, however, is bridging the gap between computational route design and practical laboratory execution, particularly the accurate prediction of viable experimental procedures for each synthesis step. In this work, we present QFANG, a scientific reasoning language model capable of generating precise, structured experimental procedures directly from reaction equations, with explicit chain-of-thought reasoning. To develop QFANG, we curated a high-quality dataset comprising 905,990 chemical reactions paired with structured action sequences, extracted and processed from patent literature using large language models. We introduce a Chemistry-Guided Reasoning (CGR) framework that produces chain-of-thought data grounded in chemical knowledge at scale. The model subsequently undergoes supervised fine-tuning to elicit complex chemistry reasoning. Finally, we apply Reinforcement Learning from Verifiable Rewards (RLVR) to further enhance procedural accuracy. Experimental results demonstrate that QFANG outperforms advanced general-purpose reasoning models and nearest-neighbor retrieval baselines, measured by traditional NLP similarity metrics and a chemically aware evaluator using an LLM-as-a-judge. Moreover, QFANG generalizes to certain out-of-domain reaction classes and adapts to variations in laboratory conditions and user-specific constraints. We believe that QFANG's ability to generate high-quality synthesis procedures represents an important step toward bridging the gap between computational synthesis planning and fully automated laboratory synthesis.
Related papers
- Agentic reinforcement learning empowers next-generation chemical language models for molecular design and synthesis [51.83339196548892]
ChemCraft is a novel framework that decouples chemical reasoning from knowledge storage.<n>ChemCraft achieves superior performance with minimal inference costs.<n>This work establishes a cost-effective and privacy-preserving paradigm for AI-aided chemistry.
arXiv Detail & Related papers (2026-01-25T04:23:34Z) - oMeBench: Towards Robust Benchmarking of LLMs in Organic Mechanism Elucidation and Reasoning [44.36582860924775]
We introduce oMeBench, the first large-scale, expert-curated benchmark for organic mechanism reasoning in organic chemistry.<n>We also propose oMeS, a dynamic evaluation framework that combines step-level logic and chemical similarity.
arXiv Detail & Related papers (2025-10-09T03:13:31Z) - ChemOrch: Empowering LLMs with Chemical Intelligence via Synthetic Instructions [52.79349601462865]
ChemOrch is a framework that synthesizes chemically grounded instruction-response pairs.<n>ChemOrch enables controllable diversity and levels of difficulty for the generated tasks.
arXiv Detail & Related papers (2025-09-20T05:43:58Z) - ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z) - AutoChemSchematic AI: Agentic Physics-Aware Automation for Chemical Manufacturing Scale-Up [2.5875933818780363]
Current AI systems cannot yet reliably generate critical engineering schematics.<n>We present a closed-loop, physics-aware framework for automated generation of industrially viable PFDs and PIDs.<n>We show that our framework generates simulator-validated process descriptions with high fidelity.
arXiv Detail & Related papers (2025-05-30T13:32:00Z) - A User-Tunable Machine Learning Framework for Step-Wise Synthesis Planning [9.502407569651321]
MHNpath is a machine learning-driven retrosynthetic tool for computer-aided synthesis planning.<n>We demonstrate its effectiveness through case studies involving complex molecules from ChemByDesign.<n>Our case studies reveal that the tool can generate shorter, cheaper, moderate-temperature routes employing green solvents.
arXiv Detail & Related papers (2025-04-03T00:23:21Z) - Validation of the Scientific Literature via Chemputation Augmented by Large Language Models [0.0]
Chemputation is the process of programming chemical robots to do experiments using a universal symbolic language, but the literature can be error prone and hard to read due to ambiguities.
Large Language Models (LLMs) have demonstrated remarkable capabilities in various domains, including natural language processing, robotic control, and more recently, chemistry.
We introduce an LLM-based chemical research agent workflow designed for the automatic validation of synthetic literature procedures.
arXiv Detail & Related papers (2024-10-08T21:31:42Z) - BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction.
Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions.
This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z) - ChemMiner: A Large Language Model Agent System for Chemical Literature Data Mining [56.15126714863963]
ChemMiner is an end-to-end framework for extracting chemical data from literature.<n>ChemMiner incorporates three specialized agents: a text analysis agent for coreference mapping, a multimodal agent for non-textual information extraction, and a synthesis analysis agent for data generation.<n> Experimental results demonstrate reaction identification rates comparable to human chemists while significantly reducing processing time, with high accuracy, recall, and F1 scores.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [55.30328162764292]
Chemist-X is a comprehensive AI agent that automates the reaction condition optimization (RCO) task in chemical synthesis.<n>The agent uses retrieval-augmented generation (RAG) technology and AI-controlled wet-lab experiment executions.<n>Results of our automatic wet-lab experiments, achieved by fully LLM-supervised end-to-end operation with no human in the lope, prove Chemist-X's ability in self-driving laboratories.
arXiv Detail & Related papers (2023-11-16T01:21:33Z) - Learning To Navigate The Synthetically Accessible Chemical Space Using
Reinforcement Learning [75.95376096628135]
We propose a novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design.
In this setup, the agent learns to navigate through the immense synthetically accessible chemical space.
We describe how the end-to-end training in this study represents an important paradigm in radically expanding the synthesizable chemical space.
arXiv Detail & Related papers (2020-04-26T21:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.