Related papers: SynKB: Semantic Search for Synthetic Procedures

SynKB: Semantic Search for Synthetic Procedures

URL: http://arxiv.org/abs/2208.07400v1
Date: Mon, 15 Aug 2022 18:33:16 GMT
Title: SynKB: Semantic Search for Synthetic Procedures
Authors: Fan Bai, Alan Ritter, Peter Madrid, Dayne Freitag, John Niekrasz
Abstract summary: We present SynKB, an open-source, automatically extracted knowledge base of chemical synthesis protocols. Similar to proprietary chemistry databases such as Reaxsys, SynKB allows chemists to retrieve structured knowledge about synthetic procedures.
Score: 9.360528362635215
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we present SynKB, an open-source, automatically extracted knowledge base of chemical synthesis protocols. Similar to proprietary chemistry databases such as Reaxsys, SynKB allows chemists to retrieve structured knowledge about synthetic procedures. By taking advantage of recent advances in natural language processing for procedural texts, SynKB supports more flexible queries about reaction conditions, and thus has the potential to help chemists search the literature for conditions used in relevant reactions as they design new synthetic routes. Using customized Transformer models to automatically extract information from 6 million synthesis procedures described in U.S. and EU patents, we show that for many queries, SynKB has higher recall than Reaxsys, while maintaining high precision. We plan to make SynKB available as an open-source tool; in contrast, proprietary chemistry databases require costly subscriptions.

Related papers

ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z)
Automated Retrosynthesis Planning of Macromolecules Using Large Language Models and Knowledge Graphs [11.191853171170516]
We propose an agent system that integrates large language models (LLMs) and knowledge graphs. Our system fully automates the retrieval of relevant literatures, extraction of reaction data, database querying, construction of retrosynthetic pathway trees. This work represents the first attempt to develop a fully automated retrosynthesis planning agent tailored specially for macromolecules powered by LLMs.
arXiv Detail & Related papers (2025-01-15T16:06:10Z)
ASKCOS: an open source software suite for synthesis planning [7.245299433003954]
We detail the newest version of ASKCOS, an open source software suite for synthesis planning. Four one-step retrosynthesis models form the basis of both interactive planning and automatic planning modes. It is our belief that CASP tools like ASKCOS are an important part of modern chemistry research.
arXiv Detail & Related papers (2025-01-03T14:38:03Z)
Validation of the Scientific Literature via Chemputation Augmented by Large Language Models [0.0]
Chemputation is the process of programming chemical robots to do experiments using a universal symbolic language, but the literature can be error prone and hard to read due to ambiguities. Large Language Models (LLMs) have demonstrated remarkable capabilities in various domains, including natural language processing, robotic control, and more recently, chemistry. We introduce an LLM-based chemical research agent workflow designed for the automatic validation of synthetic literature procedures.
arXiv Detail & Related papers (2024-10-08T21:31:42Z)
BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction. Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions. This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z)
SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation [55.2480439325792]
We study the synthesis of six datasets, covering topic classification, sentiment analysis, tone detection, and humor. We find that SynthesizRR greatly improves lexical and semantic diversity, similarity to human-written text, and distillation performance.
arXiv Detail & Related papers (2024-05-16T12:22:41Z)
An Autonomous Large Language Model Agent for Chemical Literature Data Mining [60.85177362167166]
We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature. Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
arXiv Detail & Related papers (2024-02-20T13:21:46Z)
Precursor recommendation for inorganic synthesis by machine learning materials similarity from scientific literature [0.0]
We use a knowledge base of 29,900 solid-state synthesis recipes to automatically learn which precursors to recommend for the synthesis of a novel target material. The data-driven approach learns chemical similarity of materials and refers the synthesis of a new target to precedent synthesis procedures of similar materials. Our approach captures decades of synthesis data in a mathematical form, making it accessible for use in recommendation engines and autonomous laboratories.
arXiv Detail & Related papers (2023-02-05T04:57:59Z)
Recent advances in artificial intelligence for retrosynthesis [29.32667622776065]
Retrosynthesis is the cornerstone of organic chemistry, providing chemists in material and drug manufacturing access to poorly available and brand-new molecules. Recent breakthroughs driven by artificial intelligence have revolutionized retrosynthesis.
arXiv Detail & Related papers (2023-01-14T09:29:39Z)
Importance of Synthesizing High-quality Data for Text-to-SQL Parsing [71.02856634369174]
State-of-the-art text-to-weighted algorithms did not further improve on popular benchmarks when trained with augmented synthetic data. We propose a novel framework that incorporates key relationships from schema, imposes strong typing, and schema-weighted column sampling.
arXiv Detail & Related papers (2022-12-17T02:53:21Z)
FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning [58.47265392465442]
Retrosynthetic planning aims to devise a complete multi-step synthetic route from starting materials to a target molecule. Current strategies use a decoupled approach of single-step retrosynthesis models and search algorithms. We propose a novel framework that utilizes context information for improved retrosynthetic planning.
arXiv Detail & Related papers (2022-09-30T08:44:58Z)
ULSA: Unified Language of Synthesis Actions for Representation of Synthesis Protocols [2.436060325115753]
We propose the first Unified Language of Synthesis Actions (ULSA) for describing synthesis procedures. We created a dataset of 3,040 synthesis procedures annotated by domain experts according to the proposed ULSA scheme.
arXiv Detail & Related papers (2022-01-23T17:44:48Z)
RetroXpert: Decompose Retrosynthesis Prediction like a Chemist [60.463900712314754]
We devise a novel template-free algorithm for automatic retrosynthetic expansion. Our method disassembles retrosynthesis into two steps. While outperforming the state-of-the-art baselines, our model also provides chemically reasonable interpretation.
arXiv Detail & Related papers (2020-11-04T04:35:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.