Related papers: Automated Retrosynthesis Planning of Macromolecules Using Large Language Models and Knowledge Graphs

Automated Retrosynthesis Planning of Macromolecules Using Large Language Models and Knowledge Graphs

URL: http://arxiv.org/abs/2501.08897v2
Date: Tue, 15 Apr 2025 14:40:07 GMT
Title: Automated Retrosynthesis Planning of Macromolecules Using Large Language Models and Knowledge Graphs
Authors: Qinyu Ma, Yuhao Zhou, Jianfeng Li,
Abstract summary: We propose an agent system that integrates large language models (LLMs) and knowledge graphs.<n>Our system fully automates the retrieval of relevant literatures, extraction of reaction data, database querying, construction of retrosynthetic pathway trees.<n>This work represents the first attempt to develop a fully automated retrosynthesis planning agent tailored specially for macromolecules powered by LLMs.
Score: 11.191853171170516
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Identifying reliable synthesis pathways in materials chemistry is a complex task, particularly in polymer science, due to the intricate and often non-unique nomenclature of macromolecules. To address this challenge, we propose an agent system that integrates large language models (LLMs) and knowledge graphs. By leveraging LLMs' powerful capabilities for extracting and recognizing chemical substance names, and storing the extracted data in a structured knowledge graph, our system fully automates the retrieval of relevant literatures, extraction of reaction data, database querying, construction of retrosynthetic pathway trees, further expansion through the retrieval of additional literature and recommendation of optimal reaction pathways. By considering the complex interdependencies among chemical reactants, a novel Multi-branched Reaction Pathway Search Algorithm (MBRPS) is proposed to help identify all valid multi-branched reaction pathways, which arise when a single product decomposes into multiple reaction intermediates. In contrast, previous studies were limited to cases where a product decomposes into at most one reaction intermediate. This work represents the first attempt to develop a fully automated retrosynthesis planning agent tailored specially for macromolecules powered by LLMs. Applied to polyimide synthesis, our new approach constructs a retrosynthetic pathway tree with hundreds of pathways and recommends optimized routes, including both known and novel pathways. This demonstrates utilizing LLMs for literature consultation to accomplish specific tasks is possible and crucial for future materials research, given the vast amount of materials-related literature.

Related papers

A Multi-Agent System Enables Versatile Information Extraction from the Chemical Literature [8.306442315850878]
We develop a multimodal large language model (MLLM)-based multi-agent system for robust and automated chemical information extraction.<n>Our system achieved an F1 score of 80.8% on a benchmark dataset of sophisticated multimodal chemical reaction graphics from the literature.
arXiv Detail & Related papers (2025-07-27T11:16:57Z)
DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning [0.0]
DeepRetro is an open-source, iterative, hybrid LLM-based retrosynthetic framework.<n>Our approach integrates the strengths of conventional template-based/Monte Carlo tree search tools with the generative power of LLMs in a step-wise, feedback-driven loop.<n>This approach successfully generates novel pathways for complex natural product compounds.
arXiv Detail & Related papers (2025-07-07T19:41:39Z)
ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z)
LLM-Augmented Chemical Synthesis and Design Decision Programs [18.41721617026997]
We introduce an efficient scheme for encoding reaction pathways and present a new route-level search strategy.<n>We show that our LLM-augmented approach excels at retrosynthesis planning and extends naturally to the broader challenge of synthesizable molecular design.
arXiv Detail & Related papers (2025-05-11T15:43:00Z)
Enhancing Chemical Reaction and Retrosynthesis Prediction with Large Language Model and Dual-task Learning [8.402406301818905]
Large language models (LLMs) have shown potential in many domains.<n>ChemDual is a novel framework for accurate chemical synthesis.<n>ChemDual achieves state-of-the-art performance in both predictions of reaction and retrosynthesis.
arXiv Detail & Related papers (2025-05-05T13:31:36Z)
Interpretable Deep Learning for Polar Mechanistic Reaction Prediction [43.95903801494905]
We introduce PMechRP (Polar Mechanistic Reaction Predictor), a system that trains machine learning models on the PMechDB dataset. We train compare a range of machine learning models, including transformer-based, graph-based and two-step siamese architectures. Our best-performing approach was a hybrid model, which combines a 5-ensemble of Chemformer models with a two-step Siamese framework.
arXiv Detail & Related papers (2025-04-22T02:31:23Z)
Automated, LLM enabled extraction of synthesis details for reticular materials from scientific literature [29.097783516208892]
We introduce a Knowledge Extraction Pipeline (KEP) that automatizes LLM-assisted paragraph classification and information extraction. We demonstrate that LLMs can retrieve chemical information from PDF documents, without the need for fine-tuning or training. The results show the potential of the KEP approach for reducing human annotations and data curation efforts.
arXiv Detail & Related papers (2024-11-05T20:08:23Z)
SynthFormer: Equivariant Pharmacophore-based Generation of Molecules for Ligand-Based Drug Design [1.3927943269211591]
This paper addresses the gap between in silico generative approaches and practical in vitro methodologies. We introduce SynthFormer, a novel ML model that utilizes a 3D equivariant encoder for pharmacophores to generate fully synthesizable molecules. Our contributions include a new methodology for efficient chemical space exploration using 3D information, a novel architecture called Synthformer for translating 3D pharmacophore representations into molecules, and a meaningful embedding space that organizes reagents for drug discovery optimization.
arXiv Detail & Related papers (2024-10-03T17:38:46Z)
Leveraging Chemistry Foundation Models to Facilitate Structure Focused Retrieval Augmented Generation in Multi-Agent Workflows for Catalyst and Materials Design [0.0]
We show that chemistry foundation models can serve as a basis for enabling structure-focused, semantic chemistry information retrieval.<n>We also show the use of chemistry foundation models in conjunction with multi-modal models such as OpenCLIP.
arXiv Detail & Related papers (2024-08-21T17:25:45Z)
BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction. Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions. This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z)
SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation [55.2480439325792]
We study the synthesis of six datasets, covering topic classification, sentiment analysis, tone detection, and humor. We find that SynthesizRR greatly improves lexical and semantic diversity, similarity to human-written text, and distillation performance.
arXiv Detail & Related papers (2024-05-16T12:22:41Z)
An Autonomous Large Language Model Agent for Chemical Literature Data Mining [60.85177362167166]
We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature. Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
arXiv Detail & Related papers (2024-02-20T13:21:46Z)
Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [55.30328162764292]
Chemist-X is a comprehensive AI agent that automates the reaction condition optimization (RCO) task in chemical synthesis. The agent uses retrieval-augmented generation (RAG) technology and AI-controlled wet-lab experiment executions. Results of our automatic wet-lab experiments, achieved by fully LLM-supervised end-to-end operation with no human in the lope, prove Chemist-X's ability in self-driving laboratories.
arXiv Detail & Related papers (2023-11-16T01:21:33Z)
MechRetro is a chemical-mechanism-driven graph learning framework for interpretable retrosynthesis prediction and pathway planning [10.364476820771607]
MechRetro is a graph learning framework for interpretable retrosynthetic prediction and pathway planning. By integrating chemical knowledge as prior information, we design a novel Graph Transformer architecture. We demonstrate that MechRetro outperforms the state-of-the-art approaches for retrosynthetic prediction with a large margin on large-scale benchmark datasets.
arXiv Detail & Related papers (2022-10-06T01:27:53Z)
FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning [58.47265392465442]
Retrosynthetic planning aims to devise a complete multi-step synthetic route from starting materials to a target molecule. Current strategies use a decoupled approach of single-step retrosynthesis models and search algorithms. We propose a novel framework that utilizes context information for improved retrosynthetic planning.
arXiv Detail & Related papers (2022-09-30T08:44:58Z)
Retroformer: Pushing the Limits of Interpretable End-to-end Retrosynthesis Transformer [15.722719721123054]
Retrosynthesis prediction is one of the fundamental challenges in organic synthesis. We propose Retroformer, a novel Transformer-based architecture for retrosynthesis prediction. Retroformer reaches the new state-of-the-art accuracy for the end-to-end template-free retrosynthesis.
arXiv Detail & Related papers (2022-01-29T02:03:55Z)
Retrosynthetic Planning with Experience-Guided Monte Carlo Tree Search [10.67810457039541]
In retrosynthetic planning, the huge number of possible routes to synthesize a complex molecule leads to an explosion of possibilities. Current approaches rely on human-defined or machine-trained score functions which have limited chemical knowledge. We build an experience guidance network to learn knowledge from synthetic experiences during the search.
arXiv Detail & Related papers (2021-12-11T17:14:15Z)
RetroXpert: Decompose Retrosynthesis Prediction like a Chemist [60.463900712314754]
We devise a novel template-free algorithm for automatic retrosynthetic expansion. Our method disassembles retrosynthesis into two steps. While outperforming the state-of-the-art baselines, our model also provides chemically reasonable interpretation.
arXiv Detail & Related papers (2020-11-04T04:35:34Z)
Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search [83.22850633478302]
Retrosynthetic planning identifies a series of reactions that can lead to the synthesis of a target product. Existing methods either require expensive return estimation by rollout with high variance, or optimize for search speed rather than the quality. We propose Retro*, a neural-based A*-like algorithm that finds high-quality synthetic routes efficiently.
arXiv Detail & Related papers (2020-06-29T05:53:33Z)
Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning [75.95376096628135]
We propose a novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design. In this setup, the agent learns to navigate through the immense synthetically accessible chemical space. We describe how the end-to-end training in this study represents an important paradigm in radically expanding the synthesizable chemical space.
arXiv Detail & Related papers (2020-04-26T21:40:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.